* Possible Idea with filesystem buffering. @ 2002-01-20 9:04 Shawn 2002-01-20 11:31 ` Hans Reiser 2002-01-20 15:49 ` Anton Altaparmakov 0 siblings, 2 replies; 92+ messages in thread From: Shawn @ 2002-01-20 9:04 UTC (permalink / raw) To: linux-kernel I've noticed that XFS's filesystem has a separate pagebuf_daemon to handle caching/buffering. Why not make a kernel page/caching daemon for other filesystems to use (kpagebufd) so that each filesystem can use a kernel daemon interface to handle buffering and caching. I found that XFS's buffering/caching significantly reduced I/O load on the system (with riel's rmap11b + rml's preempt patches and Andre's IDE patch). But I've not been able to acheive the same speed results with ReiserFS :-( Just as we have a filesystem (VFS) layer, why not have a buffering/caching layer for the filesystems to use inconjunction with the VM? Comments, suggestions, flames welcome ;) Shawn. ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 9:04 Possible Idea with filesystem buffering Shawn @ 2002-01-20 11:31 ` Hans Reiser 2002-01-20 13:56 ` Rik van Riel 2002-01-20 22:45 ` Shawn Starr 2002-01-20 15:49 ` Anton Altaparmakov 1 sibling, 2 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-20 11:31 UTC (permalink / raw) To: Shawn; +Cc: linux-kernel In version 4 of reiserfs, our plan is to implement writepage such that it does not write the page but instead pressures the reiser4 cache and marks the page as recently accessed. This is Linus's preferred method of doing that. Personally, I think that makes writepage the wrong name for that function, but I must admit it gets the job done, and it leaves writepage as the right name for all filesystems that don't manage their own cache, which is most of them. Hans Shawn wrote: >I've noticed that XFS's filesystem has a separate pagebuf_daemon to handle >caching/buffering. > >Why not make a kernel page/caching daemon for other filesystems to use >(kpagebufd) so that each filesystem can use a kernel daemon interface to >handle buffering and caching. > >I found that XFS's buffering/caching significantly reduced I/O load on the >system (with riel's rmap11b + rml's preempt patches and Andre's IDE >patch). > >But I've not been able to acheive the same speed results with ReiserFS :-( > >Just as we have a filesystem (VFS) layer, why not have a buffering/caching >layer for the filesystems to use inconjunction with the VM? > There is hostility to this from one of the VM maintainers. He is concerned that separate caches were what they had before and they behaved badly. I think that they simply coded them wrong the time before. The time before, the pressure on the subcaches was uneven, with some caches only getting pressure if the other caches couldn't free anything, so of course it behaved badly. > > >Comments, suggestions, flames welcome ;) > >Shawn. > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 11:31 ` Hans Reiser @ 2002-01-20 13:56 ` Rik van Riel 2002-01-20 14:21 ` Hans Reiser 2002-01-20 22:45 ` Shawn Starr 1 sibling, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-20 13:56 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn, linux-kernel On Sun, 20 Jan 2002, Hans Reiser wrote: > In version 4 of reiserfs, our plan is to implement writepage such that > it does not write the page but instead pressures the reiser4 cache and > marks the page as recently accessed. What is this supposed to achieve ? > Personally, I think that makes writepage the wrong name for that > function, but I must admit it gets the job done, And what job would that be ? regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 13:56 ` Rik van Riel @ 2002-01-20 14:21 ` Hans Reiser 2002-01-20 15:13 ` Rik van Riel 2002-01-20 17:51 ` Mark Hahn 0 siblings, 2 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-20 14:21 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn, linux-kernel, Josh MacDonald Write clustering is one thing it achieves. When we flush a slum, the cost of the seek so far outweighs the transfer cost that we should transfer FLUSH_SIZE (where flush size is imagined to be something like 64 or 16 or at least 8) adjacent (in the tree order) nodes at the same time to disk. There are many ways in which LRU is only an approximation to optimum. This is one of many. Flushing everything involved in a transaction so that (the buffers being pinned in RAM (so that they don't have to be reread from disk when the transaction commits) until the transaction commits) can be unpinned is another thing. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 14:21 ` Hans Reiser @ 2002-01-20 15:13 ` Rik van Riel 2002-01-20 21:15 ` Hans Reiser 2002-01-20 17:51 ` Mark Hahn 1 sibling, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-20 15:13 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn, linux-kernel, Josh MacDonald On Sun, 20 Jan 2002, Hans Reiser wrote: > Write clustering is one thing it achieves. > > Flushing everything involved in a transaction ... is another thing. Agreed on these points, but you really HAVE TO work towards flushing the page ->writepage() gets called for. Think about your typical PC, with memory in ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM. If we are short on DMA pages we will end up calling ->writepage() on a DMA page. If the filesystem ends up writing completely unrelated pages and marking the DMA page in question referenced the VM will go in a loop until the filesystem finally gets around to making a page in the (small) DMA zone freeable ... regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 15:13 ` Rik van Riel @ 2002-01-20 21:15 ` Hans Reiser 2002-01-20 21:24 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-20 21:15 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn, linux-kernel, Josh MacDonald Rik van Riel wrote: >On Sun, 20 Jan 2002, Hans Reiser wrote: > >>Write clustering is one thing it achieves. >> >>Flushing everything involved in a transaction ... is another thing. >> > >Agreed on these points, but you really HAVE TO work towards >flushing the page ->writepage() gets called for. > >Think about your typical PC, with memory in ZONE_DMA, >ZONE_NORMAL and ZONE_HIGHMEM. If we are short on DMA pages >we will end up calling ->writepage() on a DMA page. > >If the filesystem ends up writing completely unrelated pages >and marking the DMA page in question referenced the VM will >go in a loop until the filesystem finally gets around to >making a page in the (small) DMA zone freeable ... > This is a bug in VM design, yes? It should signal that it needs the particular page written, which probnably means that it should use writepage only when it needs that particular page written, and should otherwise check to see if the filesystem supports something like pressure_fs_cache(), yes? > > >regards, > >Rik > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:15 ` Hans Reiser @ 2002-01-20 21:24 ` Rik van Riel 2002-01-20 21:30 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-20 21:24 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn, linux-kernel, Josh MacDonald On Mon, 21 Jan 2002, Hans Reiser wrote: > Rik van Riel wrote: > >On Sun, 20 Jan 2002, Hans Reiser wrote: > >Agreed on these points, but you really HAVE TO work towards > >flushing the page ->writepage() gets called for. > > > >Think about your typical PC, with memory in ZONE_DMA, > >ZONE_NORMAL and ZONE_HIGHMEM. If we are short on DMA pages > >we will end up calling ->writepage() on a DMA page. > > > >If the filesystem ends up writing completely unrelated pages > >and marking the DMA page in question referenced the VM will > >go in a loop until the filesystem finally gets around to > >making a page in the (small) DMA zone freeable ... > > This is a bug in VM design, yes? It should signal that it needs the > particular page written, which probnably means that it should use > writepage only when it needs that particular page written, That is exactly what the VM does. > and should otherwise check to see if the filesystem supports something > like pressure_fs_cache(), yes? That's incompatible with the concept of memory zones. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:24 ` Rik van Riel @ 2002-01-20 21:30 ` Hans Reiser 2002-01-20 21:40 ` Rik van Riel 2002-01-21 15:29 ` Eric W. Biederman 0 siblings, 2 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-20 21:30 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn, linux-kernel, Josh MacDonald Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>Rik van Riel wrote: >> >>>On Sun, 20 Jan 2002, Hans Reiser wrote: >>> > >>>Agreed on these points, but you really HAVE TO work towards >>>flushing the page ->writepage() gets called for. >>> >>>Think about your typical PC, with memory in ZONE_DMA, >>>ZONE_NORMAL and ZONE_HIGHMEM. If we are short on DMA pages >>>we will end up calling ->writepage() on a DMA page. >>> >>>If the filesystem ends up writing completely unrelated pages >>>and marking the DMA page in question referenced the VM will >>>go in a loop until the filesystem finally gets around to >>>making a page in the (small) DMA zone freeable ... >>> >>This is a bug in VM design, yes? It should signal that it needs the >>particular page written, which probnably means that it should use >>writepage only when it needs that particular page written, >> > >That is exactly what the VM does. > So basically you continue to believe that one cache manager shall rule them all, and in the darkness as to their needs, bind them. > > >>and should otherwise check to see if the filesystem supports something >>like pressure_fs_cache(), yes? >> > >That's incompatible with the concept of memory zones. > Care to explain more? > > >regards, > >Rik > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:30 ` Hans Reiser @ 2002-01-20 21:40 ` Rik van Riel 2002-01-20 21:49 ` Hans Reiser 2002-01-21 15:29 ` Eric W. Biederman 1 sibling, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-20 21:40 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn, linux-kernel, Josh MacDonald On Mon, 21 Jan 2002, Hans Reiser wrote: > >>and should otherwise check to see if the filesystem supports something > >>like pressure_fs_cache(), yes? > > > >That's incompatible with the concept of memory zones. > > Care to explain more? On basically any machine we'll have multiple memory zones. Each of those memory zones has its own free list and each of the zones can get low on free pages independantly of the other zones. This means that if the VM asks to get a particular page freed, at the very minimum you need to make a page from the same zone freeable. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:40 ` Rik van Riel @ 2002-01-20 21:49 ` Hans Reiser 2002-01-20 22:00 ` Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-20 21:49 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn, linux-kernel, Josh MacDonald Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>>>and should otherwise check to see if the filesystem supports something >>>>like pressure_fs_cache(), yes? >>>> >>>That's incompatible with the concept of memory zones. >>> >>Care to explain more? >> > >On basically any machine we'll have multiple memory zones. > >Each of those memory zones has its own free list and each >of the zones can get low on free pages independantly of the >other zones. > >This means that if the VM asks to get a particular page >freed, at the very minimum you need to make a page from the >same zone freeable. > >regards, > >Rik > I'll discuss with Josh tomorrow how we might implement support for that. A clean and simple mechanism does not come to my mind immediately. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:49 ` Hans Reiser @ 2002-01-20 22:00 ` Rik van Riel 2002-01-21 0:10 ` Matt 2002-01-21 9:13 ` Horst von Brand 2 siblings, 0 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-20 22:00 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn, linux-kernel, Josh MacDonald On Mon, 21 Jan 2002, Hans Reiser wrote: > >This means that if the VM asks to get a particular page > >freed, at the very minimum you need to make a page from the > >same zone freeable. > > I'll discuss with Josh tomorrow how we might implement support for that. > A clean and simple mechanism does not come to my mind immediately. Note that in order to support more reliable allocation of contiguous memory areas (eg. for loading modules) we may also want to add some simple form of defragmentation to the VM. If you really want to make life easy for the VM, ->writepage() should work towards making the page it is called for freeable. You probably want to do this since an easy VM is good for performance and it would be embarrasing if reiserfs had the worst performance under load simply due to bad interaction with other subsystems... kind regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:49 ` Hans Reiser 2002-01-20 22:00 ` Rik van Riel @ 2002-01-21 0:10 ` Matt 2002-01-21 0:57 ` Hans Reiser ` (2 more replies) 2002-01-21 9:13 ` Horst von Brand 2 siblings, 3 replies; 92+ messages in thread From: Matt @ 2002-01-21 0:10 UTC (permalink / raw) To: Hans Reiser; +Cc: Rik van Riel, Shawn, linux-kernel, Josh MacDonald On Mon, Jan 21, 2002 at 12:49:27AM +0300, Hans Reiser wrote: > Rik van Riel wrote: [snip snip] >> On basically any machine we'll have multiple memory zones. >> Each of those memory zones has its own free list and each of the >> zones can get low on free pages independantly of the other zones. >> This means that if the VM asks to get a particular page freed, at >> the very minimum you need to make a page from the same zone >> freeable. >> regards, >> Rik > I'll discuss with Josh tomorrow how we might implement support for that. > A clean and simple mechanism does not come to my mind immediately. > Hans i know this sounds semi-evil, but can't you just drop another non dirty page and do a copy if you need the page you have been asked to write out? because if you have no non dirty pages around you'd probably have to drop the page anyway at some stage.. matt ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 0:10 ` Matt @ 2002-01-21 0:57 ` Hans Reiser 2002-01-21 1:28 ` Anton Altaparmakov 2002-01-21 9:21 ` Horst von Brand 2 siblings, 0 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-21 0:57 UTC (permalink / raw) To: Matt; +Cc: Rik van Riel, Shawn, linux-kernel, Josh MacDonald Matt wrote: >On Mon, Jan 21, 2002 at 12:49:27AM +0300, Hans Reiser wrote: > >>Rik van Riel wrote: >> > >[snip snip] > >>>On basically any machine we'll have multiple memory zones. >>> > >>>Each of those memory zones has its own free list and each of the >>>zones can get low on free pages independantly of the other zones. >>> > >>>This means that if the VM asks to get a particular page freed, at >>>the very minimum you need to make a page from the same zone >>>freeable. >>> > >>>regards, >>> > >>>Rik >>> > > >>I'll discuss with Josh tomorrow how we might implement support for that. >> A clean and simple mechanism does not come to my mind immediately. >> > >>Hans >> > >i know this sounds semi-evil, but can't you just drop another non >dirty page and do a copy if you need the page you have been asked to >write out? because if you have no non dirty pages around you'd >probably have to drop the page anyway at some stage.. > > matt > Yes, but it is seriously suboptimal to do copies if not really needed. So, if we really must, then yes, but must we? Would be best if VM told us if we really must write that page. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 0:10 ` Matt 2002-01-21 0:57 ` Hans Reiser @ 2002-01-21 1:28 ` Anton Altaparmakov 2002-01-21 2:29 ` Shawn Starr 2002-01-21 9:21 ` Horst von Brand 2 siblings, 1 reply; 92+ messages in thread From: Anton Altaparmakov @ 2002-01-21 1:28 UTC (permalink / raw) To: Hans Reiser; +Cc: Matt, Rik van Riel, Shawn, linux-kernel, Josh MacDonald [snip] At 00:57 21/01/02, Hans Reiser wrote: [snip] > Would be best if VM told us if we really must write that page. In theory the VM should never call writepage unless the page must be writen out... But I agree with you that it would be good to be able to distinguish the two cases. I have been thinking about this a bit in the context of NTFS TNG but I think that it would be better to have a generic solution rather than every fs does their own copy of the same thing. I envisage that there is a flush daemon which just walks around writing pages to disk in the background (there could be one per fs, or a generic one which fs register with, at their option they could have their own of course) in order to keep the number of dirty pages low and in order to minimize data loss on the event of system/power failure. This demon requires several interfaces though, with regards to journalling fs. The daemon should have an interface where the fs can say "commit pages in this list NOW and do not return before done", also a barrier operation would be required in journalling context. A transactions interface would be ideal, where the fs can submit whole transactions consisting of writing out a list of pages and optional write barriers; e.g. write journal pages x, y, z, barrier, write metadata, perhaps barrier, finally write data pages a, b, c. Simple file systems could just not bother at all and rely on the flush daemon calling the fs to write the pages. Obviously when this daemon writes pages the pages will continue being there. OTOH, if the VM calls write page because it needs to free memory then writepage must write and clean the page. So, yes, a parameter to write page would be great in this context. Alternatively we could have ->writepage and ->flushpage (or pick your favourite two names) one being an optional writeout and one a forced writeout... I like the parameter to writepage idea better but in the end it doesn't really matter that much I would suspect... Best regards, Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/ ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 1:28 ` Anton Altaparmakov @ 2002-01-21 2:29 ` Shawn Starr 2002-01-21 19:15 ` Shawn Starr 0 siblings, 1 reply; 92+ messages in thread From: Shawn Starr @ 2002-01-21 2:29 UTC (permalink / raw) To: Anton Altaparmakov Cc: Hans Reiser, Matt, Rik van Riel, linux-kernel, Josh MacDonald On Mon, 21 Jan 2002, Anton Altaparmakov wrote: > [snip] > At 00:57 21/01/02, Hans Reiser wrote: > [snip] > > Would be best if VM told us if we really must write that page. > > In theory the VM should never call writepage unless the page must be writen > out... > > But I agree with you that it would be good to be able to distinguish the > two cases. I have been thinking about this a bit in the context of NTFS TNG > but I think that it would be better to have a generic solution rather than > every fs does their own copy of the same thing. I envisage that there is a > flush daemon which just walks around writing pages to disk in the > background (there could be one per fs, or a generic one which fs register > with, at their option they could have their own of course) in order to keep > the number of dirty pages low and in order to minimize data loss on the > event of system/power failure. > > This demon requires several interfaces though, with regards to journalling > fs. The daemon should have an interface where the fs can say "commit pages > in this list NOW and do not return before done", also a barrier operation > would be required in journalling context. A transactions interface would be > ideal, where the fs can submit whole transactions consisting of writing out > a list of pages and optional write barriers; e.g. write journal pages x, y, > z, barrier, write metadata, perhaps barrier, finally write data pages a, b, > c. Simple file systems could just not bother at all and rely on the flush > daemon calling the fs to write the pages. > > Obviously when this daemon writes pages the pages will continue being > there. OTOH, if the VM calls write page because it needs to free memory > then writepage must write and clean the page. > if they are dirty and written immediately to the disk they can be cleaned from the queue. It would be nice if there was some way to have a checksum verify the data was written back then wipe it from the queue. As an example: 5 operations requested, 2 already in queue. In queue) DIRTY write to disk (this task has been in the queue for a while) In queue) not 'old' memory but must be written to disk pending queue: 1) read operation 2) read operation 3) Write operation 4) write operation The daemon should resort the priority write dirty pages to disk then write nay other pages that are left on queue, then get to read pages. Notes: If there is only one operation in the queue (say write) and nothing else comes along, then the daemon should force-write the data back to disk after a period of timeout (the memory in the slot becomes dirty) If there's too many tasks in the queue and another one requires more memory then whats left in the buffer/cache the daemon could request to store the request in swap memory and put it in the queue, if the request is a write request it would have more priority then any read requests still and get completed quickly allowing for remaining queue events to complete. Example: ReiserFS: Operation A. Write (10K) Operation B. Read (200K) Operation C. Write (160K) XFS: Operation A. Read (63K) Operation B. Read (3k) Operation C. Write (10K) EXT3: Operation A. Write (290K) Operation B. Write (90K) Operation C. Read (3k) the kpagebuf (or whatever name). Would get all these requests and sort out what needs to be done first as long as there's buffer/cache memory free the write operations would be done as fast as possible, verified by some checksum and purged from the queue, If there's no cache/buffer memory free then all write queues reguardless of being in swap or cache/buffer need to be written to disk. So: kpagebuf queue (total available buffer/cache memory is say 512K) EXT3 Write (290K) ReiserFS Write (160K) ReiserFS Write (10K) XFS Write (10K) EXT3 Write (90K) - Goes in swap because total > 512K (Dirty x2 state) ReiserFS Read (200K) - Swap (dirty x2) XFS Read (63K) - Swap (dirty x2) XFS Read (3K) - Swap (dirty x2) EXT3 Read (3K) - Swap (dirty x2) * The daemon would check in order of filesystem registeration for whos should be in the read queue first. * The daemon should maximize amount of memory stored in bufeer/cache to try to prevent write requests having to go into swap. In the above queue, we have a lot of read operations and one write operation in swap. Clean out the write operations since they are now dirty (because there's no room for more operations in the buffer/cache). Move the swapped write operation to the top of the queue and get rid of it. Move the read operations from swap to queue since there is room again. ** NOTE ** because those read requests are now dirty they MUST be delt with or they'll get stuck in the queue with more write requests overtaking them. Maybe I've lost it but that's how I see it ;) Shawn. ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 2:29 ` Shawn Starr @ 2002-01-21 19:15 ` Shawn Starr 2002-01-22 22:02 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Shawn Starr @ 2002-01-21 19:15 UTC (permalink / raw) To: linux-kernel Nobody wants to comment on this? :( Shawn. On Sun, 2002-01-20 at 21:29, Shawn Starr wrote: > > On Mon, 21 Jan 2002, Anton Altaparmakov wrote: > > > [snip] > > At 00:57 21/01/02, Hans Reiser wrote: > > [snip] > > > Would be best if VM told us if we really must write that page. > > > > In theory the VM should never call writepage unless the page must be writen > > out... > > > > But I agree with you that it would be good to be able to distinguish the > > two cases. I have been thinking about this a bit in the context of NTFS TNG > > but I think that it would be better to have a generic solution rather than > > every fs does their own copy of the same thing. I envisage that there is a > > flush daemon which just walks around writing pages to disk in the > > background (there could be one per fs, or a generic one which fs register > > with, at their option they could have their own of course) in order to keep > > the number of dirty pages low and in order to minimize data loss on the > > event of system/power failure. > > > > This demon requires several interfaces though, with regards to journalling > > fs. The daemon should have an interface where the fs can say "commit pages > > in this list NOW and do not return before done", also a barrier operation > > would be required in journalling context. A transactions interface would be > > ideal, where the fs can submit whole transactions consisting of writing out > > a list of pages and optional write barriers; e.g. write journal pages x, y, > > z, barrier, write metadata, perhaps barrier, finally write data pages a, b, > > c. Simple file systems could just not bother at all and rely on the flush > > daemon calling the fs to write the pages. > > > > Obviously when this daemon writes pages the pages will continue being > > there. OTOH, if the VM calls write page because it needs to free memory > > then writepage must write and clean the page. > > > > if they are dirty and written immediately to the disk they can be cleaned > from the queue. It would be nice if there was some way to have a checksum > verify the data was written back then wipe it from the queue. > > As an example: 5 operations requested, 2 already in queue. > > In queue) DIRTY write to disk (this task has been in the queue for a > while) > > In queue) not 'old' memory but must be written to disk > > pending queue: > > 1) read operation > 2) read operation > 3) Write operation > 4) write operation > > The daemon should resort the priority write dirty pages to disk then write > nay other pages that are left on queue, then get to read pages. > > > Notes: > > If there is only one operation in the queue (say write) and nothing else > comes along, then the daemon should force-write the data back to disk > after a period of timeout (the memory in the slot becomes dirty) > > If there's too many tasks in the queue and another one requires more > memory then whats left in the buffer/cache the daemon could request to > store the request in swap memory and put it in the queue, if the request > is a write request it would have more priority then any read requests > still and get completed quickly allowing for remaining queue events to > complete. > > Example: > > ReiserFS: > Operation A. Write (10K) > Operation B. Read (200K) > Operation C. Write (160K) > > > XFS: > Operation A. Read (63K) > Operation B. Read (3k) > Operation C. Write (10K) > > > EXT3: > Operation A. Write (290K) > Operation B. Write (90K) > Operation C. Read (3k) > > the kpagebuf (or whatever name). Would get all these requests and sort out > what needs to be done first as long as there's buffer/cache memory free > the write operations would be done as fast as possible, verified by some > checksum and purged from the queue, If there's no cache/buffer memory > free then all write queues reguardless of being in swap or cache/buffer need to be > written to disk. > > So: > kpagebuf queue (total available buffer/cache memory is say 512K) > > EXT3 Write (290K) > ReiserFS Write (160K) > ReiserFS Write (10K) > XFS Write (10K) > EXT3 Write (90K) - Goes in swap because total > 512K (Dirty x2 state) > ReiserFS Read (200K) - Swap (dirty x2) > XFS Read (63K) - Swap (dirty x2) > XFS Read (3K) - Swap (dirty x2) > EXT3 Read (3K) - Swap (dirty x2) > > * The daemon would check in order of filesystem registeration for whos > should be in the read queue first. > > * The daemon should maximize amount of memory stored in bufeer/cache to > try to prevent write requests having to go into swap. > > In the above queue, we have a lot of read operations and one write > operation in swap. Clean out the write operations since they are now dirty > (because there's no room for more operations in the buffer/cache). Move > the swapped write operation to the top of the queue and get rid of it. > Move the read operations from swap to queue since there is room again. ** > NOTE ** because those read requests are now dirty they MUST be delt with > or they'll get stuck in the queue with more write requests overtaking > them. > > Maybe I've lost it but that's how I see it ;) > > Shawn. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 19:15 ` Shawn Starr @ 2002-01-22 22:02 ` Hans Reiser 0 siblings, 0 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-22 22:02 UTC (permalink / raw) To: Shawn Starr; +Cc: linux-kernel Shawn, I didn't respond to this because it seems like you are mixing in issues relating to the elevator code into this, and so I don't really understand you. Hans Shawn Starr wrote: >Nobody wants to comment on this? :( > >Shawn. > >On Sun, 2002-01-20 at 21:29, Shawn Starr wrote: > >>On Mon, 21 Jan 2002, Anton Altaparmakov wrote: >> >>>[snip] >>>At 00:57 21/01/02, Hans Reiser wrote: >>>[snip] >>> > Would be best if VM told us if we really must write that page. >>> >>>In theory the VM should never call writepage unless the page must be writen >>>out... >>> >>>But I agree with you that it would be good to be able to distinguish the >>>two cases. I have been thinking about this a bit in the context of NTFS TNG >>>but I think that it would be better to have a generic solution rather than >>>every fs does their own copy of the same thing. I envisage that there is a >>>flush daemon which just walks around writing pages to disk in the >>>background (there could be one per fs, or a generic one which fs register >>>with, at their option they could have their own of course) in order to keep >>>the number of dirty pages low and in order to minimize data loss on the >>>event of system/power failure. >>> >>>This demon requires several interfaces though, with regards to journalling >>>fs. The daemon should have an interface where the fs can say "commit pages >>>in this list NOW and do not return before done", also a barrier operation >>>would be required in journalling context. A transactions interface would be >>>ideal, where the fs can submit whole transactions consisting of writing out >>>a list of pages and optional write barriers; e.g. write journal pages x, y, >>>z, barrier, write metadata, perhaps barrier, finally write data pages a, b, >>>c. Simple file systems could just not bother at all and rely on the flush >>>daemon calling the fs to write the pages. >>> >>>Obviously when this daemon writes pages the pages will continue being >>>there. OTOH, if the VM calls write page because it needs to free memory >>>then writepage must write and clean the page. >>> >>if they are dirty and written immediately to the disk they can be cleaned >>from the queue. It would be nice if there was some way to have a checksum >>verify the data was written back then wipe it from the queue. >> >>As an example: 5 operations requested, 2 already in queue. >> >>In queue) DIRTY write to disk (this task has been in the queue for a >>while) >> >>In queue) not 'old' memory but must be written to disk >> >>pending queue: >> >>1) read operation >>2) read operation >>3) Write operation >>4) write operation >> >>The daemon should resort the priority write dirty pages to disk then write >>nay other pages that are left on queue, then get to read pages. >> >> >>Notes: >> >>If there is only one operation in the queue (say write) and nothing else >>comes along, then the daemon should force-write the data back to disk >>after a period of timeout (the memory in the slot becomes dirty) >> >>If there's too many tasks in the queue and another one requires more >>memory then whats left in the buffer/cache the daemon could request to >>store the request in swap memory and put it in the queue, if the request >>is a write request it would have more priority then any read requests >>still and get completed quickly allowing for remaining queue events to >>complete. >> >>Example: >> >>ReiserFS: >> Operation A. Write (10K) >> Operation B. Read (200K) >> Operation C. Write (160K) >> >> >>XFS: >> Operation A. Read (63K) >> Operation B. Read (3k) >> Operation C. Write (10K) >> >> >>EXT3: >> Operation A. Write (290K) >> Operation B. Write (90K) >> Operation C. Read (3k) >> >>the kpagebuf (or whatever name). Would get all these requests and sort out >>what needs to be done first as long as there's buffer/cache memory free >>the write operations would be done as fast as possible, verified by some >>checksum and purged from the queue, If there's no cache/buffer memory >>free then all write queues reguardless of being in swap or cache/buffer need to be >>written to disk. >> >>So: >>kpagebuf queue (total available buffer/cache memory is say 512K) >> >> EXT3 Write (290K) >> ReiserFS Write (160K) >> ReiserFS Write (10K) >> XFS Write (10K) >> EXT3 Write (90K) - Goes in swap because total > 512K (Dirty x2 state) >> ReiserFS Read (200K) - Swap (dirty x2) >> XFS Read (63K) - Swap (dirty x2) >> XFS Read (3K) - Swap (dirty x2) >> EXT3 Read (3K) - Swap (dirty x2) >> >>* The daemon would check in order of filesystem registeration for whos >>should be in the read queue first. >> >>* The daemon should maximize amount of memory stored in bufeer/cache to >>try to prevent write requests having to go into swap. >> >>In the above queue, we have a lot of read operations and one write >>operation in swap. Clean out the write operations since they are now dirty >>(because there's no room for more operations in the buffer/cache). Move >>the swapped write operation to the top of the queue and get rid of it. >>Move the read operations from swap to queue since there is room again. ** >>NOTE ** because those read requests are now dirty they MUST be delt with >>or they'll get stuck in the queue with more write requests overtaking >>them. >> >>Maybe I've lost it but that's how I see it ;) >> >>Shawn. >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >>Please read the FAQ at http://www.tux.org/lkml/ >> > > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 0:10 ` Matt 2002-01-21 0:57 ` Hans Reiser 2002-01-21 1:28 ` Anton Altaparmakov @ 2002-01-21 9:21 ` Horst von Brand 2 siblings, 0 replies; 92+ messages in thread From: Horst von Brand @ 2002-01-21 9:21 UTC (permalink / raw) To: Matt; +Cc: Hans Reiser, linux-kernel, Josh MacDonald Matt <matt@progsoc.uts.edu.au> said: [...] > i know this sounds semi-evil, but can't you just drop another non > dirty page and do a copy if you need the page you have been asked to > write out? because if you have no non dirty pages around you'd > probably have to drop the page anyway at some stage.. Better not. "Get rid of A", OK, copied to B. "Get rid of B", OK, copied to C. Lather. Rinse. Repeat. -- Horst von Brand http://counter.li.org # 22616 ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:49 ` Hans Reiser 2002-01-20 22:00 ` Rik van Riel 2002-01-21 0:10 ` Matt @ 2002-01-21 9:13 ` Horst von Brand 2 siblings, 0 replies; 92+ messages in thread From: Horst von Brand @ 2002-01-21 9:13 UTC (permalink / raw) To: Hans Reiser; +Cc: linux-kernel, Josh MacDonald Hans Reiser <reiser@namesys.com> said: > Rik van Riel wrote: [...] >On basically any machine we'll have multiple memory zones. > > > >Each of those memory zones has its own free list and each > >of the zones can get low on free pages independantly of the > >other zones. > > > >This means that if the VM asks to get a particular page > >freed, at the very minimum you need to make a page from the > >same zone freeable. > I'll discuss with Josh tomorrow how we might implement support for that. > A clean and simple mechanism does not come to my mind immediately. Free the page you were asked to free, optionally free anything else you might want to. Anything else sounds like a gross violation of layering to me. The other way would be for the VM to say "Free at least <n> pages of this <list>", but that gives a complicated API. -- Horst von Brand http://counter.li.org # 22616 ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:30 ` Hans Reiser 2002-01-20 21:40 ` Rik van Riel @ 2002-01-21 15:29 ` Eric W. Biederman 1 sibling, 0 replies; 92+ messages in thread From: Eric W. Biederman @ 2002-01-21 15:29 UTC (permalink / raw) To: Hans Reiser; +Cc: Rik van Riel, Shawn, linux-kernel, Josh MacDonald Hans Reiser <reiser@namesys.com> writes: > > > >That is exactly what the VM does. > > > So basically you continue to believe that one cache manager shall rule them all, > > and in the darkness as to their needs, bind them. Hans any other case generally sucks, and at best works well until the VM changes and then breaks. The worst VM's I have seen are the home spun cache management routines for compressing filesystems. So trying for a generic solution is very good. I suspect it easier to work out the semantics needed for reiserfs and xfs to do delayed writes in the page cache than to work out the semantics needed for having to competing VM's... Eric ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 14:21 ` Hans Reiser 2002-01-20 15:13 ` Rik van Riel @ 2002-01-20 17:51 ` Mark Hahn 2002-01-20 21:24 ` Hans Reiser 1 sibling, 1 reply; 92+ messages in thread From: Mark Hahn @ 2002-01-20 17:51 UTC (permalink / raw) To: Hans Reiser; +Cc: Rik van Riel, linux-kernel On Sun, 20 Jan 2002, Hans Reiser wrote: > Write clustering is one thing it achieves. When we flush a slum, the sure, that's fine. when the VM tells you to write a page, you're free to write *more*, but you certainly must give back that particular page. afaicr, this was the conclusion of the long-ago thread that you're referring to. regards, mark hahn. ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 17:51 ` Mark Hahn @ 2002-01-20 21:24 ` Hans Reiser 2002-01-20 21:32 ` Rik van Riel 2002-01-21 15:37 ` Eric W. Biederman 0 siblings, 2 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-20 21:24 UTC (permalink / raw) To: Mark Hahn; +Cc: Rik van Riel, linux-kernel Mark Hahn wrote: >On Sun, 20 Jan 2002, Hans Reiser wrote: > >>Write clustering is one thing it achieves. When we flush a slum, the >> > >sure, that's fine. when the VM tells you to write a page, >you're free to write *more*, but you certainly must give back >that particular page. afaicr, this was the conclusion >of the long-ago thread that you're referring to. > >regards, mark hahn. > > > This is bad for use with internal nodes. It simplifies version 4 a bunch to assume that if a node is in cache, its parent is also. Not sure what to do about it, maybe we need to copy the node. Surely we don't want to copy it unless it is a DMA related page cleaning. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:24 ` Hans Reiser @ 2002-01-20 21:32 ` Rik van Riel 2002-01-21 15:37 ` Eric W. Biederman 1 sibling, 0 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-20 21:32 UTC (permalink / raw) To: Hans Reiser; +Cc: Mark Hahn, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > Mark Hahn wrote: > >On Sun, 20 Jan 2002, Hans Reiser wrote: > > > >>Write clustering is one thing it achieves. When we flush a slum, the > > > >sure, that's fine. when the VM tells you to write a page, > >you're free to write *more*, but you certainly must give back > >that particular page. afaicr, this was the conclusion > >of the long-ago thread that you're referring to. > > This is bad for use with internal nodes. It simplifies version 4 a > bunch to assume that if a node is in cache, its parent is also. Not > sure what to do about it, maybe we need to copy the node. Surely we > don't want to copy it unless it is a DMA related page cleaning. DMA isn't a special case, this thing can happen with ANY memory zone. Unless of course you decide to make reiserfs unsupported for NUMA machines... regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 21:24 ` Hans Reiser 2002-01-20 21:32 ` Rik van Riel @ 2002-01-21 15:37 ` Eric W. Biederman 1 sibling, 0 replies; 92+ messages in thread From: Eric W. Biederman @ 2002-01-21 15:37 UTC (permalink / raw) To: Hans Reiser; +Cc: Mark Hahn, Rik van Riel, linux-kernel Hans Reiser <reiser@namesys.com> writes: > Mark Hahn wrote: > > >On Sun, 20 Jan 2002, Hans Reiser wrote: > > > >> Write clustering is one thing it achieves. When we flush a slum, the > > > >sure, that's fine. when the VM tells you to write a page, > >you're free to write *more*, but you certainly must give back > > that particular page. afaicr, this was the conclusion of the long-ago thread > > that you're referring to. > > > >regards, mark hahn. > > > > > > > This is bad for use with internal nodes. It simplifies version 4 a bunch to > assume that if a node is in cache, its parent is also. Not sure what to do > about it, maybe we need to copy the node. Surely we don't want to copy it > unless it is a DMA related page cleaning. Increment the count on the parent page, and don't decrement it until the child goes away. This might need a notification from page_cache_release when so you can decrement the count at the appropriate time. But internal nodes are ``meta'' data which has always had special freeing rules. Eric ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 11:31 ` Hans Reiser 2002-01-20 13:56 ` Rik van Riel @ 2002-01-20 22:45 ` Shawn Starr 2002-01-20 23:11 ` Rik van Riel 1 sibling, 1 reply; 92+ messages in thread From: Shawn Starr @ 2002-01-20 22:45 UTC (permalink / raw) To: Hans Reiser; +Cc: linux-kernel But why should each filesystem have to have a different method of buffering/caching? that just doesn't fit the layered model of the kernel IMHO. Shawn. On Sun, 20 Jan 2002, Hans Reiser wrote: > In version 4 of reiserfs, our plan is to implement writepage such that > it does not write the page but instead pressures the reiser4 cache and > marks the page as recently accessed. This is Linus's preferred method > of doing that. > > Personally, I think that makes writepage the wrong name for that > function, but I must admit it gets the job done, and it leaves writepage > as the right name for all filesystems that don't manage their own cache, > which is most of them. > > Hans > > Shawn wrote: > > >I've noticed that XFS's filesystem has a separate pagebuf_daemon to handle > >caching/buffering. > > > >Why not make a kernel page/caching daemon for other filesystems to use > >(kpagebufd) so that each filesystem can use a kernel daemon interface to > >handle buffering and caching. > > > >I found that XFS's buffering/caching significantly reduced I/O load on the > >system (with riel's rmap11b + rml's preempt patches and Andre's IDE > >patch). > > > >But I've not been able to acheive the same speed results with ReiserFS :-( > > > >Just as we have a filesystem (VFS) layer, why not have a buffering/caching > >layer for the filesystems to use inconjunction with the VM? > > > There is hostility to this from one of the VM maintainers. He is > concerned that separate caches were what they had before and they > behaved badly. I think that they simply coded them wrong the time > before. The time before, the pressure on the subcaches was uneven, with > some caches only getting pressure if the other caches couldn't free > anything, so of course it behaved badly. > > > > > > >Comments, suggestions, flames welcome ;) > > > >Shawn. > > > >- > >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > >the body of a message to majordomo@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html > >Please read the FAQ at http://www.tux.org/lkml/ > > > > > > > > > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 22:45 ` Shawn Starr @ 2002-01-20 23:11 ` Rik van Riel 2002-01-20 23:40 ` Shawn Starr 2002-01-21 0:28 ` Hans Reiser 0 siblings, 2 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-20 23:11 UTC (permalink / raw) To: Shawn Starr; +Cc: Hans Reiser, linux-kernel On Sun, 20 Jan 2002, Shawn Starr wrote: > But why should each filesystem have to have a different method of > buffering/caching? that just doesn't fit the layered model of the > kernel IMHO. I think Hans will give up the idea once he realises the performance implications. ;) Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 23:11 ` Rik van Riel @ 2002-01-20 23:40 ` Shawn Starr 2002-01-20 23:48 ` Rik van Riel 2002-01-21 0:28 ` Hans Reiser 1 sibling, 1 reply; 92+ messages in thread From: Shawn Starr @ 2002-01-20 23:40 UTC (permalink / raw) To: Rik van Riel; +Cc: Hans Reiser, linux-kernel My worry is this. If we have different filesystems having their own page buffer/caching daemons we'll definately introduce race conditions. Say have 2 hard drives with ReiserFS and EXT3 and im copying data between the two and each of them has their own daemons its going to get pretty messy no? On Sun, 20 Jan 2002, Rik van Riel wrote: > On Sun, 20 Jan 2002, Shawn Starr wrote: > > > But why should each filesystem have to have a different method of > > buffering/caching? that just doesn't fit the layered model of the > > kernel IMHO. > > I think Hans will give up the idea once he realises the > performance implications. ;) > > Rik > -- > "Linux holds advantages over the single-vendor commercial OS" > -- Microsoft's "Competing with Linux" document > > http://www.surriel.com/ http://distro.conectiva.com/ > > > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 23:40 ` Shawn Starr @ 2002-01-20 23:48 ` Rik van Riel 2002-01-21 0:44 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-20 23:48 UTC (permalink / raw) To: Shawn Starr; +Cc: Hans Reiser, linux-kernel On Sun, 20 Jan 2002, Shawn Starr wrote: > My worry is this. If we have different filesystems having their own page > buffer/caching daemons we'll definately introduce race conditions. > > Say have 2 hard drives with ReiserFS and EXT3 and im copying data between > the two and each of them has their own daemons its going to get pretty > messy no? Each of the "cache daemons" will react differently to VM pressure, meaning the system will most definately get out of balance. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 23:48 ` Rik van Riel @ 2002-01-21 0:44 ` Hans Reiser 2002-01-21 0:52 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 0:44 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Sun, 20 Jan 2002, Shawn Starr wrote: > >>My worry is this. If we have different filesystems having their own page >>buffer/caching daemons we'll definately introduce race conditions. >> >>Say have 2 hard drives with ReiserFS and EXT3 and im copying data between >>the two and each of them has their own daemons its going to get pretty >>messy no? >> > >Each of the "cache daemons" will react differently to VM >pressure, meaning the system will most definately get out >of balance. > >regards, > >Rik > Not if you provide a proper design of a master cache manager. Really, all you have to do is have the subcache managers designed to free the same number of pages on average in response to pressure, and to pressure them in proportion to their size, and it is pretty simple for VM. Now of course, we can talk about all sorts of possible refinements of this, such as perhaps for some caches pressure in proportion to the square of their size is appropriate, or perhaps for some caches their pressure should be some multiple of some other cache's pressure (suppose the cost of fetching a page from disk is different from fetching a page over a network, and you have two different caches of pages, one from a disk backing store, and one of pages from a network device backing store, then it IS optimal to keep the pages from the slower device longer). I would suggest that such refinements go in later though. Right now, we just want a simple interface for implementing the pressure response for Reiser4. More complex can wait until after we ship 4.0, and can luxuriate in multitudinous benchmarks. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 0:44 ` Hans Reiser @ 2002-01-21 0:52 ` Rik van Riel 2002-01-21 1:08 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-21 0:52 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn Starr, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > Not if you provide a proper design of a master cache manager. > Really, all you have to do is have the subcache managers designed to > free the same number of pages on average in response to pressure, and > to pressure them in proportion to their size, and it is pretty simple > for VM. I take it you're volunteering to bring ext3, XFS, JFS, JFFS2, NFS, the inode & dentry cache and smbfs into shape so reiserfs won't get unbalanced ? regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 0:52 ` Rik van Riel @ 2002-01-21 1:08 ` Hans Reiser 2002-01-21 1:39 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 1:08 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>Not if you provide a proper design of a master cache manager. >>Really, all you have to do is have the subcache managers designed to >>free the same number of pages on average in response to pressure, and >>to pressure them in proportion to their size, and it is pretty simple >>for VM. >> > >I take it you're volunteering to bring ext3, XFS, JFS, >JFFS2, NFS, the inode & dentry cache and smbfs into >shape so reiserfs won't get unbalanced ? > >regards, > >Rik > If they use writepage(), then the job of balancing cache cleaning is done, we just use writepage as their pressuring mechanism. Any FS that wants to optimize cleaning can implement a VFS method, and any FS that wants to optimize freeing can implement a VFS method, and all others can use their generic VM current mechanisms. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 1:08 ` Hans Reiser @ 2002-01-21 1:39 ` Rik van Riel 2002-01-21 11:10 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-21 1:39 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn Starr, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > Rik van Riel wrote: > >I take it you're volunteering to bring ext3, XFS, JFS, > >JFFS2, NFS, the inode & dentry cache and smbfs into > >shape so reiserfs won't get unbalanced ? > If they use writepage(), then the job of balancing cache cleaning is > done, we just use writepage as their pressuring mechanism. > Any FS that wants to optimize cleaning can implement a VFS method, and > any FS that wants to optimize freeing can implement a VFS method, and > all others can use their generic VM current mechanisms. It seems you're still assuming that different filesystems will all see the same kind of load. Freeing cache (or at least, applying pressure) really is a job for the VM because none of the filesystems will have any idea exactly how busy the other filesystems are. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 1:39 ` Rik van Riel @ 2002-01-21 11:10 ` Hans Reiser 2002-01-21 12:12 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 11:10 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>Rik van Riel wrote: >> > >>>I take it you're volunteering to bring ext3, XFS, JFS, >>>JFFS2, NFS, the inode & dentry cache and smbfs into >>>shape so reiserfs won't get unbalanced ? >>> > >>If they use writepage(), then the job of balancing cache cleaning is >>done, we just use writepage as their pressuring mechanism. >>Any FS that wants to optimize cleaning can implement a VFS method, and >>any FS that wants to optimize freeing can implement a VFS method, and >>all others can use their generic VM current mechanisms. >> > >It seems you're still assuming that different filesystems will >all see the same kind of load. > I don't understand this comment. > > >Freeing cache (or at least, applying pressure) really is a job >for the VM because none of the filesystems will have any idea >exactly how busy the other filesystems are. > I fully agree, and it is the point I have been making (poorly, since it has not communicated) for as long as I have been discussing it with you. The VM should apply pressure to the caches. It should define an interface that subcache managers act in response to. The larger a subcache is, the more percentage of total memory pressure it should receive. The amount of memory pressure per unit of time should be determined by the VM. Note that there are two kinds of pressure, cleaning pressure and freeing pressure. I think that the structure appropriate for delegating them is the same, but someone may correct me. Also note that a unit of pressure is a unit of aging, not a unit of freeing/cleaning. The application of pressure does not necessarily free a page, it merely ages the subcache, which might or might not free a page depending on how much use is being made of what is in the subcache. Thus, a subcache receives pressure to grow from somewhere (things like write() in the case of ReiserFS), and pressure to shrink from VM, and VM exerts however much total pressure on all the subcaches is required to not run out of memory. The mechanism of going through pages, seeing what subcache they belong to, and pressuring that subcache, is a decent one (if a bit CPU cache expensive) for obtaining linearly proportional cache pressure. Since code inertia favors it, let's use it for now. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 11:10 ` Hans Reiser @ 2002-01-21 12:12 ` Rik van Riel 2002-01-21 13:42 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-21 12:12 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn Starr, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > >It seems you're still assuming that different filesystems will > >all see the same kind of load. > > I don't understand this comment. [snip] > The VM should apply pressure to the caches. It should define an > interface that subcache managers act in response to. The larger a > subcache is, the more percentage of total memory pressure it should > receive. Wrong. If one filesystem is actively being used (eg. kernel compile) and the other filesystem's cache isn't being used (this one held the tarball of the kernel source) then the cache which is being used actively should receive less pressure than the cache which doesn't hold any active pages. We really want to evict the kernel tarball from memory while keeping the kernel source and object files resident. This is exactly the reason why each filesystem cannot manage its own cache ... it doesn't know anything about what the system as a whole is doing. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 12:12 ` Rik van Riel @ 2002-01-21 13:42 ` Hans Reiser 2002-01-21 13:54 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 13:42 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>>It seems you're still assuming that different filesystems will >>>all see the same kind of load. >>> >>I don't understand this comment. >> > >[snip] > >>The VM should apply pressure to the caches. It should define an >>interface that subcache managers act in response to. The larger a >>subcache is, the more percentage of total memory pressure it should >>receive. >> > >Wrong. If one filesystem is actively being used (eg. kernel >compile) and the other filesystem's cache isn't being used >(this one held the tarball of the kernel source) then the >cache which is being used actively should receive less >pressure than the cache which doesn't hold any active pages. > Pressure received is not equal to pages yielded. Think of pressure as a request to age on average one page. Not a request to free on average one page. The pressure received should be in proportion to the percentage of total memory pages in use by the subcache. The number of pages yielded should depend on the interplay of pressure received and accesses made. Does this make more sense now? > > >We really want to evict the kernel tarball from memory while >keeping the kernel source and object files resident. > If your example is based on untarring a kernel tarball from one filesystem to another, it is doomed, because you probably want to drop-behind the tarball contents. I think I know what you mean though, so let's use an example of one filesystem containing the files of a user who logs in once a week mostly to check his email that he doesn't get very often, and the other contains the files of a programmer who recompiles every 5 minutes. Is this what you intend? If so, I think the mechanism described above handles it. Perhaps writepage isn't the cleanest way to implement it though, maybe the page aging mechanism is where the call to the subcache belongs. > > >This is exactly the reason why each filesystem cannot manage >its own cache ... it doesn't know anything about what the >system as a whole is doing. > Each filesystem can be told how much aging pressure to exert on itself. The VM tracks what the system as a whole is doing, and the filesystem tracks what its subcache is doing, and the filesystem listens to the VM and acts accordingly. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 13:42 ` Hans Reiser @ 2002-01-21 13:54 ` Rik van Riel 2002-01-21 14:07 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-21 13:54 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn Starr, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > Pressure received is not equal to pages yielded. ... The number of > pages yielded should depend on the interplay of pressure received and > accesses made. > > Does this make more sense now? Nice recipie for total chaos. You _know_ each filesystem will behave differently in this respect, it'll be impossible to get the VM balanced in this way... Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 13:54 ` Rik van Riel @ 2002-01-21 14:07 ` Hans Reiser 2002-01-21 17:21 ` Chris Mason 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 14:07 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>Pressure received is not equal to pages yielded. ... The number of >>pages yielded should depend on the interplay of pressure received and >>accesses made. >> >>Does this make more sense now? >> > >Nice recipie for total chaos. You _know_ each filesystem will >behave differently in this respect, it'll be impossible to get >the VM balanced in this way... > >Rik > No, I don't _know_ that. Just because it got screwed up previously doesn't mean that no one can ever get it right. I think there should be well commented code with well commented templates and examples, and persons who abuse the interface should be handled like persons who abuse all the other interfaces. Optimal is optimal, and if VM's default is seriously suboptimal for a particular backing store then it simply shouldn't be used for that backing store. Write clustering, slum squeezing, block allocating, encrypting, committing transactions, all of these are serious things that should be pushed by memory pressure from a VM that delegates. This issue is no different from a human boss that refuses to delegate because he doesn't want to lose control, and he doesn't have the managerial skill that gives him the confidence that he can delegate well, and so nothing gets done well because he doesn't have the time to optimize all of the subordinates working for him as well as they could optimize themselves. Rik, your plan won't scale. Sure, you have the time needed to create one example template, but you cannot possibly create a single VM well optimized for every cache in the kernel. They each have different needs, different properties, different filessytem layouts. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 14:07 ` Hans Reiser @ 2002-01-21 17:21 ` Chris Mason 2002-01-21 17:47 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Chris Mason @ 2002-01-21 17:21 UTC (permalink / raw) To: Hans Reiser, Rik van Riel; +Cc: Shawn Starr, linux-kernel On Monday, January 21, 2002 05:07:30 PM +0300 Hans Reiser <reiser@namesys.com> wrote: > Rik van Riel wrote: > >> On Mon, 21 Jan 2002, Hans Reiser wrote: >> >>> Pressure received is not equal to pages yielded. ... The number of >>> pages yielded should depend on the interplay of pressure received and >>> accesses made. >>> Ah, once the FS starts counting accesses, we get in trouble. The FS should strive to know only these 3 things: How to read useful data into a page How to flush a dirty page How to free a pinned page The VM records everything else, including how often a page is accessed, and which pages should be freed in response to memory pressure. Of course, the FS might have details on many more things such as write clustering, delayed allocations, or which pinned pages require tons of extra work to write out. This fools us into thinking the FS might be the best place to decide how to react under memory pressure, leading to a little VM in each FS. Everything gets cleaner if we push this info up to the VM in a generic fashion, instead of trying to push bits of the VM down into each filesystem. The FS should have no idea of what memory pressure is, down that path lies pain, suffering, and deadlocks against the journal ;-) If the VM is telling the FS to write a pinned page when there are unpinned pages that can be written with less cost, then we need to give the VM better hints about the actual cost of writing the pinned page. For periodic group flushes (delayed allocation, journal commits, etc), we need better throttling on dirty pages instead of just dirty buffers like we do now. I'm not delusional enough to think this will make all the vm<->journal nastiness go away, but it hopefully should be less painful than adding extra VM intelligence into each FS. -chris ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 17:21 ` Chris Mason @ 2002-01-21 17:47 ` Hans Reiser 2002-01-21 19:44 ` Chris Mason 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 17:47 UTC (permalink / raw) To: Chris Mason; +Cc: Rik van Riel, Shawn Starr, linux-kernel Chris Mason wrote: > >On Monday, January 21, 2002 05:07:30 PM +0300 Hans Reiser ><reiser@namesys.com> wrote: > >>Rik van Riel wrote: >> >>>On Mon, 21 Jan 2002, Hans Reiser wrote: >>> >>>>Pressure received is not equal to pages yielded. ... The number of >>>>pages yielded should depend on the interplay of pressure received and >>>>accesses made. >>>> > >Ah, once the FS starts counting accesses, we get in trouble. The FS should >strive to know only these 3 things: > >How to read useful data into a page >How to flush a dirty page >How to free a pinned page > You say this with the all the dogma of someone working with code that currently does things a particular way. You provide no reasons though. > > >The VM records everything else, including how often a page is accessed, and >which pages should be freed in response to memory pressure. Of course, the >FS might have details on many more things such as write clustering, delayed >allocations, or which pinned pages require tons of extra work to write out. >This fools us into thinking the FS might be the best place to decide how to >react under memory pressure, leading to a little VM in each FS. > >Everything gets cleaner if we push this info up to the VM in a generic >fashion, instead of trying to push bits of the VM down into each >filesystem. >The FS should have no idea of what memory pressure is, down that path lies >pain, suffering, and deadlocks against the journal ;-) > >If the VM is telling the FS to write a pinned page when there are unpinned >pages that can be written with less cost, then we need to give the VM >better hints about the actual cost of writing the pinned page. > Oh, this means a much more complicated interface, and it means that the VM must take into account the optimizations of each and every filesystem. Are you sure this isn't an unmaintainable centralized hell? In practice, will it really mean that optimizations specific to a particular filesystem will get ignored, because there will be too many of them to keep up with, and they will clutter each other up if implemented in one piece of code? Will programmers really be able to experiment? > > >For periodic group flushes (delayed allocation, journal commits, etc), we >need better throttling on dirty pages instead of just dirty buffers like we >do now. > >I'm not delusional enough to think this will make all the vm<->journal >nastiness go away, but it hopefully should be less painful than adding >extra VM intelligence into each FS. > >-chris > > > Say more about what you mean by better throttling on dirty pages, and how that meets the needs of slum squeezing, transaction committing, write clustering, etc. Last I remember, the generic write clustering code in VM didn't even understand packing localities.;-) Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 17:47 ` Hans Reiser @ 2002-01-21 19:44 ` Chris Mason 2002-01-21 20:41 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Chris Mason @ 2002-01-21 19:44 UTC (permalink / raw) To: Hans Reiser; +Cc: Rik van Riel, Shawn Starr, linux-kernel On Monday, January 21, 2002 08:47:00 PM +0300 Hans Reiser <reiser@namesys.com> wrote: > Chris Mason wrote: > >> >> On Monday, January 21, 2002 05:07:30 PM +0300 Hans Reiser >> <reiser@namesys.com> wrote: >> >>> Rik van Riel wrote: >>> >>>> On Mon, 21 Jan 2002, Hans Reiser wrote: >>>> >>>>> Pressure received is not equal to pages yielded. ... The number of >>>>> pages yielded should depend on the interplay of pressure received and >>>>> accesses made. >>>>> >> >> Ah, once the FS starts counting accesses, we get in trouble. The FS >> should strive to know only these 3 things: >> >> How to read useful data into a page >> How to flush a dirty page >> How to free a pinned page >> > You say this with the all the dogma of someone working with code that > currently does things a particular way. You provide no reasons though. ;-) In general, every bit of the VM we modify and copy into the FS will: A) break later on as the rest of the VM evolves B) perform poorly on hardware we don't have (numa). C) make odd, hard to trigger bugs due to strange interactions on large machines and certain work loads. D) require almost constant maintenance. And that is how it works right now. The journal is a subcache that does not respond to memory pressure the same way on all the journaled filesystems, and none of them are optimal. >> >> Everything gets cleaner if we push this info up to the VM in a generic >> fashion, instead of trying to push bits of the VM down into each >> filesystem. >> The FS should have no idea of what memory pressure is, down that path >> lies pain, suffering, and deadlocks against the journal ;-) >> >> If the VM is telling the FS to write a pinned page when there are >> unpinned pages that can be written with less cost, then we need to give >> the VM better hints about the actual cost of writing the pinned page. >> > > Oh, this means a much more complicated interface, Grin, we can't really compare interface complexity until both are written and working. > and it means that the > VM must take into account the optimizations of each and every filesystem. > Are you sure this isn't an unmaintainable centralized hell? Decentralization in this case seems much more risky. The VM needs well defined repeatable behaviour. > In practice, > will it really mean that optimizations specific to a particular > filesystem will get ignored, because there will be too many of them to > keep up with, and they will clutter each other up if implemented in one > piece of code? Will programmers really be able to experiment? The idea is to find the basic interface required to do this for us. Internally, the FS needs an interface to give hints to its own subcache, so it must be possible to give hints to a VM. I'm not pretending it will be easy to generalize, but all the filesystems need a very similar set of tools here, so it should be worth the effort. >> >> >> For periodic group flushes (delayed allocation, journal commits, etc), we >> need better throttling on dirty pages instead of just dirty buffers like >> we do now. >> >> I'm not delusional enough to think this will make all the vm<->journal >> nastiness go away, but it hopefully should be less painful than adding >> extra VM intelligence into each FS. >> > Say more about what you mean by better throttling on dirty pages, and how > that meets the needs of slum squeezing, transaction committing, write > clustering, etc. Last I remember, the generic write clustering code in > VM didn't even understand packing localities.;-) Most write throttling is done by bdflush right now, because most dirty things that need to hit disk have dirty buffers. For pinned pages, delayed allocation etc, we probably want a rate limiter unrelated to buffers at all, and one that can trigger complex actions from the FS instead of just a simple write-one-page. I'm not saying we should teach the VM how to do these complex operations, but I do think it should be in charge of deciding when they happen as much as possible. In other words, the journal would only trigger a commit on its own when the transaction was full. The other cases (too old, low ram, too many dirty pages) would be triggered by the VM. For write clustering, we could add an int clusterpage(struct page *p) address space op that allow the FS to find pages close to p, or the FS could choose to cluster in its own writepage func. -chris ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 19:44 ` Chris Mason @ 2002-01-21 20:41 ` Hans Reiser 2002-01-21 21:53 ` Chris Mason 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 20:41 UTC (permalink / raw) To: Chris Mason; +Cc: Rik van Riel, Shawn Starr, linux-kernel Chris Mason wrote: > >On Monday, January 21, 2002 08:47:00 PM +0300 Hans Reiser ><reiser@namesys.com> wrote: > >>Chris Mason wrote: >> >>>On Monday, January 21, 2002 05:07:30 PM +0300 Hans Reiser >>><reiser@namesys.com> wrote: >>> >>>>Rik van Riel wrote: >>>> >>>>>On Mon, 21 Jan 2002, Hans Reiser wrote: >>>>> >>>>>>Pressure received is not equal to pages yielded. ... The number of >>>>>>pages yielded should depend on the interplay of pressure received and >>>>>>accesses made. >>>>>> >>>Ah, once the FS starts counting accesses, we get in trouble. The FS >>>should strive to know only these 3 things: >>> >>>How to read useful data into a page >>>How to flush a dirty page >>>How to free a pinned page >>> >>You say this with the all the dogma of someone working with code that >>currently does things a particular way. You provide no reasons though. >> > >;-) In general, every bit of the VM we modify and copy into the FS will: > >A) break later on as the rest of the VM evolves >B) perform poorly on hardware we don't have (numa). >C) make odd, hard to trigger bugs due to strange interactions on large >machines and certain work loads. >D) require almost constant maintenance. > >And that is how it works right now. The journal is a subcache that does >not respond to memory pressure the same way on all the journaled >filesystems, and none of them are optimal. > This is because you didn't want to disturb VM enough to create a proper interface. You were right to have this attitude during code freeze. Code freeze is over. > > >>>Everything gets cleaner if we push this info up to the VM in a generic >>>fashion, instead of trying to push bits of the VM down into each >>>filesystem. >>>The FS should have no idea of what memory pressure is, down that path >>>lies pain, suffering, and deadlocks against the journal ;-) >>> >>>If the VM is telling the FS to write a pinned page when there are >>>unpinned pages that can be written with less cost, then we need to give >>>the VM better hints about the actual cost of writing the pinned page. >>> >>Oh, this means a much more complicated interface, >> > >Grin, we can't really compare interface complexity until both are written >and working. > Yah, yah, as the Germans taught me to say.;-) > > >>and it means that the >>VM must take into account the optimizations of each and every filesystem. >>Are you sure this isn't an unmaintainable centralized hell? >> > >Decentralization in this case seems much more risky. The VM needs well >defined repeatable behaviour. > Decentralization always seems more risky. It is why we have so many centralized economies, errh, ..... > > >>In practice, >>will it really mean that optimizations specific to a particular >>filesystem will get ignored, because there will be too many of them to >>keep up with, and they will clutter each other up if implemented in one >>piece of code? Will programmers really be able to experiment? >> > >The idea is to find the basic interface required to do this for us. >Internally, the FS needs an interface to give hints to its own subcache, so > Uh, the hints are called slums and balanced trees and unallocated extents and distinctions between overwrite sets and relocate sets and the difference between internal and leaf nodes and five different mount options for how to allocate blocks and.... I think that asking VM to understand this is simply awful. > >it must be possible to give hints to a VM. I'm not pretending it will be >easy to generalize, but all the filesystems need a very similar set of >tools here, so it should be worth the effort. > I prefer the approach used in VFS, in which templates of generic FS code are supplied, and people can use as much or as little of the generic code as they want. This allows people who just want to create a filesystem that can read a particular format to do so without unique optimizations for that FS, and people who want to write a seriously optimized filesystem that understands how to optimize for a particular layout to do so. I think that what you and Saveliev did made sense for 2.4 where we were struggling against a code freeze (well, at least there was supposed to be a code freeze on VM/VFS, but that is history we should not revisit.....), but it is not appropriate for when there is no code freeze. > > >>> >>>For periodic group flushes (delayed allocation, journal commits, etc), we >>>need better throttling on dirty pages instead of just dirty buffers like >>>we do now. >>> >>>I'm not delusional enough to think this will make all the vm<->journal >>>nastiness go away, but it hopefully should be less painful than adding >>>extra VM intelligence into each FS. >>> >>Say more about what you mean by better throttling on dirty pages, and how >>that meets the needs of slum squeezing, transaction committing, write >>clustering, etc. Last I remember, the generic write clustering code in >>VM didn't even understand packing localities.;-) >> > >Most write throttling is done by bdflush right now, because most dirty >things that need to hit disk have dirty buffers. For pinned pages, delayed >allocation etc, we probably want a rate limiter unrelated to buffers at >all, and one that can trigger complex actions from the FS instead of just a >simple write-one-page. > >I'm not saying we should teach the VM how to do these complex operations, >but I do think it should be in charge of deciding when they happen as much >as possible. In other words, the journal would only trigger a commit on >its own when the transaction was full. The other cases (too old, low ram, >too many dirty pages) would be triggered by the VM. > I read this and it sounds like you are agreeing with me, which is confusing;-), help me to understand what you mean by triggered. Do you mean VM sends pressure to the FS? Do you mean that VM understands what a transaction is? Is this that generic journaling layer trying to come alive as a piece of the VM? I am definitely confused. I think what I need to understand, is do you see the VM as telling the FS when it has (too many dirty pages or too many clean pages) and letting the FS choose to commit a transaction if it wants to as its way of cleaning pages, or do you see the VM as telling the FS to commit a transaction? If you think that VM should tell the FS when it has too many pages, does that mean that the VM understands that a particular page in the subcache has not been accessed recently enough? Is that the pivot point of our disagreement? > > >For write clustering, we could add an int clusterpage(struct page *p) >address space op that allow the FS to find pages close to p, or the FS >could choose to cluster in its own writepage func. > What you are proposing is not consistent with how Marcello is doing write clustering as part of the VM, you understand that, yes? What Marcello is doing is fine for ReiserFS V3 but won't work well for v4, do you agree? Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 20:41 ` Hans Reiser @ 2002-01-21 21:53 ` Chris Mason 2002-01-22 6:02 ` Andreas Dilger 0 siblings, 1 reply; 92+ messages in thread From: Chris Mason @ 2002-01-21 21:53 UTC (permalink / raw) To: Hans Reiser; +Cc: Rik van Riel, Shawn Starr, linux-kernel On Monday, January 21, 2002 11:41:44 PM +0300 Hans Reiser <reiser@namesys.com> wrote: > I read this and it sounds like you are agreeing with me, which is > confusing;-), No, no, you're agreeing with me ;-) > help me to understand what you mean by triggered. Do you > mean VM sends pressure to the FS? Do you mean that VM understands what a > transaction is? Is this that generic journaling layer trying to come > alive as a piece of the VM? I am definitely confused. > The vm doesn't know what a transaction is. But, the vm might know that a) this block is pinned by the FS for write ordering reasons b) the cost of writing this block is X c) calling page->somefunc will trigger writes on those blocks. The cost could be in order of magnitude, the idea would be to give the FS the chance to say 'one a scale of 1 to 10, writing this block will hurt this much'. Some blocks might have negative costs, meaning they don't depend on anything and help free others. The same system can be used for transactions and delayed allocation, without telling the VM about any specifics. > I think what I need to understand, is do you see the VM as telling the FS > when it has (too many dirty pages or too many clean pages) and letting > the FS choose to commit a transaction if it wants to as its way of > cleaning pages, or do you see the VM as telling the FS to commit a > transaction? I see the VM calling page->somefunc to flush that page, triggering whatever events the FS feels are necessary. We might want some way to differentiate between periodic writes and memory pressure, so the FS has the option of doing fancier things during write throttling. > > If you think that VM should tell the FS when it has too many pages, does > that mean that the VM understands that a particular page in the subcache > has not been accessed recently enough? Is that the pivot point of our > disagreement? Pretty much. I don't think the VM should say 'you have too many pages', I think it should say 'free this page'. >> >> >> For write clustering, we could add an int clusterpage(struct page *p) >> address space op that allow the FS to find pages close to p, or the FS >> could choose to cluster in its own writepage func. >> > What you are proposing is not consistent with how Marcello is doing write > clustering as part of the VM, you understand that, yes? What Marcello is > doing is fine for ReiserFS V3 but won't work well for v4, do you agree? Well, my only point is that it is possible to make an interface for write clustering that gives the FS the freedom to do what it needs, but still keep the intelligence about which pages need freeing first in the VM. -chris ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 21:53 ` Chris Mason @ 2002-01-22 6:02 ` Andreas Dilger 2002-01-22 10:09 ` Tommi Kyntola ` (2 more replies) 0 siblings, 3 replies; 92+ messages in thread From: Andreas Dilger @ 2002-01-22 6:02 UTC (permalink / raw) To: Chris Mason Cc: Hans Reiser, Rik van Riel, Shawn Starr, linux-kernel, ext2-devel On Jan 21, 2002 16:53 -0500, Chris Mason wrote: > On Monday, January 21, 2002 11:41:44 PM +0300 Hans Reiser wrote: > > help me to understand what you mean by triggered. Do you > > mean VM sends pressure to the FS? Do you mean that VM understands what a > > transaction is? Is this that generic journaling layer trying to come > > alive as a piece of the VM? I am definitely confused. > > The vm doesn't know what a transaction is. But, the vm might know that > a) this block is pinned by the FS for write ordering reasons > b) the cost of writing this block is X > c) calling page->somefunc will trigger writes on those blocks. > > The cost could be in order of magnitude, the idea would be to give the FS > the chance to say 'one a scale of 1 to 10, writing this block will hurt > this much'. Some blocks might have negative costs, meaning they don't > depend on anything and help free others. > > The same system can be used for transactions and delayed allocation, > without telling the VM about any specifics. > > > I think what I need to understand, is do you see the VM as telling the FS > > when it has (too many dirty pages or too many clean pages) and letting > > the FS choose to commit a transaction if it wants to as its way of > > cleaning pages, or do you see the VM as telling the FS to commit a > > transaction? > > I see the VM calling page->somefunc to flush that page, triggering whatever > events the FS feels are necessary. We might want some way to differentiate > between periodic writes and memory pressure, so the FS has the option of > doing fancier things during write throttling. The ext3 developers have also been wanting things like this for a long time, both having a "memory pressure" notification, and a differentiation between "write this now" and "this is a periodic sync, write some stuff". I've CC'd them in case they want to contribute. There are also other non-core caches in the kernel which could benefit from having a generic "memory pressure" notification. Having a generic memory pressure notification helps reduce (but not eliminate) the need to call "write this page now" into the filesystem. My guess would be that having calls into the FS with "priorities", just like shrink_dcache_memory() does, would allow the FS to make more intelligent decisions about what to write/free _before_ you get to the stage where the VM is in a panic and is telling you _specifically_ what to write/free/etc. > > If you think that VM should tell the FS when it has too many pages, does > > that mean that the VM understands that a particular page in the subcache > > has not been accessed recently enough? Is that the pivot point of our > > disagreement? > > Pretty much. I don't think the VM should say 'you have too many pages', I > think it should say 'free this page'. As above, it should have the capability to do both, depending on the circumstances. The FS can obviously make better judgements locally about what to write under normal circumstances, so it should be given the best chance to do so. The VM can make better _specific_ judgements when it needs to (e.g. free a DMA page or another specific page to allow a larger contiguous chunk of memory to be allocated), but in the cases where it just wants _some_ page(s) to be freed, it should allow the FS to decide which one(s), if it cares. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 6:02 ` Andreas Dilger @ 2002-01-22 10:09 ` Tommi Kyntola 2002-01-22 11:39 ` Hans Reiser 2002-01-22 14:03 ` Chris Mason 2 siblings, 0 replies; 92+ messages in thread From: Tommi Kyntola @ 2002-01-22 10:09 UTC (permalink / raw) To: Andreas Dilger Cc: Chris Mason, Hans Reiser, Rik van Riel, Shawn Starr, linux-kernel, ext2-devel On Mon, 21 Jan 2002, Andreas Dilger wrote: > On Jan 21, 2002 16:53 -0500, Chris Mason wrote: > > On Monday, January 21, 2002 11:41:44 PM +0300 Hans Reiser wrote: > > > If you think that VM should tell the FS when it has too many pages, does > > > that mean that the VM understands that a particular page in the subcache > > > has not been accessed recently enough? Is that the pivot point of our > > > disagreement? > > > > Pretty much. I don't think the VM should say 'you have too many pages', I > > think it should say 'free this page'. > > As above, it should have the capability to do both, depending on the > circumstances. The FS can obviously make better judgements locally about > what to write under normal circumstances, so it should be given the best > chance to do so. > > The VM can make better _specific_ judgements when it needs to (e.g. free > a DMA page or another specific page to allow a larger contiguous chunk > of memory to be allocated), but in the cases where it just wants _some_ > page(s) to be freed, it should allow the FS to decide which one(s), if > it cares. Which is pretty close to what Anton said. It seems obvious that the VM needs to use also a (hopefully rare-case) write_page where FS should comply, wether it's suboptimal or not for that particular FS. But wouldn't Anton's suggestion about having a sperate (hopefully more common case) write_some_page that'd give some leash to FS developers to optimize their page releasing based on their own demands ? It'd atleast allow centralized VM and keeping the other filesystems intact. -- Tommi "Kynde" Kyntola /* A man alone in the forest talking to himself and no women around to hear him. Is he still wrong? */ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 6:02 ` Andreas Dilger 2002-01-22 10:09 ` Tommi Kyntola @ 2002-01-22 11:39 ` Hans Reiser 2002-01-22 18:41 ` Andrew Morton 2002-01-22 14:03 ` Chris Mason 2 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-22 11:39 UTC (permalink / raw) To: Andreas Dilger Cc: Chris Mason, Rik van Riel, Shawn Starr, linux-kernel, ext2-devel So is there a consensus view that we need 2 calls, one to write a particular page, and one to exert memory pressure, and the call to write a particular page should only be used when we really need to write that particular page? Are we sure this meets the needs of memory zones, which I need to learn more about the architecture of? Hans Andreas Dilger wrote: >On Jan 21, 2002 16:53 -0500, Chris Mason wrote: > >>On Monday, January 21, 2002 11:41:44 PM +0300 Hans Reiser wrote: >> >>>help me to understand what you mean by triggered. Do you >>>mean VM sends pressure to the FS? Do you mean that VM understands what a >>>transaction is? Is this that generic journaling layer trying to come >>>alive as a piece of the VM? I am definitely confused. >>> >>The vm doesn't know what a transaction is. But, the vm might know that >>a) this block is pinned by the FS for write ordering reasons >>b) the cost of writing this block is X >>c) calling page->somefunc will trigger writes on those blocks. >> >>The cost could be in order of magnitude, the idea would be to give the FS >>the chance to say 'one a scale of 1 to 10, writing this block will hurt >>this much'. Some blocks might have negative costs, meaning they don't >>depend on anything and help free others. >> >>The same system can be used for transactions and delayed allocation, >>without telling the VM about any specifics. >> >>>I think what I need to understand, is do you see the VM as telling the FS >>>when it has (too many dirty pages or too many clean pages) and letting >>>the FS choose to commit a transaction if it wants to as its way of >>>cleaning pages, or do you see the VM as telling the FS to commit a >>>transaction? >>> >>I see the VM calling page->somefunc to flush that page, triggering whatever >>events the FS feels are necessary. We might want some way to differentiate >>between periodic writes and memory pressure, so the FS has the option of >>doing fancier things during write throttling. >> > >The ext3 developers have also been wanting things like this for a long time, >both having a "memory pressure" notification, and a differentiation between >"write this now" and "this is a periodic sync, write some stuff". I've >CC'd them in case they want to contribute. > >There are also other non-core caches in the kernel which could benefit >from having a generic "memory pressure" notification. Having a generic >memory pressure notification helps reduce (but not eliminate) the need >to call "write this page now" into the filesystem. > >My guess would be that having calls into the FS with "priorities", just >like shrink_dcache_memory() does, would allow the FS to make more >intelligent decisions about what to write/free _before_ you get to the >stage where the VM is in a panic and is telling you _specifically_ what >to write/free/etc. > >>>If you think that VM should tell the FS when it has too many pages, does >>>that mean that the VM understands that a particular page in the subcache >>>has not been accessed recently enough? Is that the pivot point of our >>>disagreement? >>> >>Pretty much. I don't think the VM should say 'you have too many pages', I >>think it should say 'free this page'. >> > >As above, it should have the capability to do both, depending on the >circumstances. The FS can obviously make better judgements locally about >what to write under normal circumstances, so it should be given the best >chance to do so. > >The VM can make better _specific_ judgements when it needs to (e.g. free >a DMA page or another specific page to allow a larger contiguous chunk of >memory to be allocated), but in the cases where it just wants _some_ page(s) >to be freed, it should allow the FS to decide which one(s), if it cares. > >Cheers, Andreas >-- >Andreas Dilger >http://sourceforge.net/projects/ext2resize/ >http://www-mddsp.enel.ucalgary.ca/People/adilger/ > > > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 11:39 ` Hans Reiser @ 2002-01-22 18:41 ` Andrew Morton 2002-01-22 19:03 ` Rik van Riel 2002-01-22 20:19 ` Hans Reiser 0 siblings, 2 replies; 92+ messages in thread From: Andrew Morton @ 2002-01-22 18:41 UTC (permalink / raw) To: Hans Reiser Cc: Andreas Dilger, Chris Mason, Rik van Riel, Shawn Starr, linux-kernel, ext2-devel Hans Reiser wrote: > > So is there a consensus view that we need 2 calls, one to write a > particular page, and one to exert memory pressure, and the call to write > a particular page should only be used when we really need to write that > particular page? > Note that writepage() doesn't get used much. Most VM-initiated filesystem writeback activity is via try_to_release_page(), which has somewhat more vague and flexible semantics. And by bdflush, which I suspect tends to conflict with sync_page_buffers() under pressure. But that's a different problem. - ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 18:41 ` Andrew Morton @ 2002-01-22 19:03 ` Rik van Riel 2002-01-23 20:35 ` [Ext2-devel] " Stephen C. Tweedie 2002-01-22 20:19 ` Hans Reiser 1 sibling, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-22 19:03 UTC (permalink / raw) To: Andrew Morton Cc: Hans Reiser, Andreas Dilger, Chris Mason, Shawn Starr, linux-kernel, ext2-devel On Tue, 22 Jan 2002, Andrew Morton wrote: > Hans Reiser wrote: > > > > So is there a consensus view that we need 2 calls, one to write a > > particular page, and one to exert memory pressure, and the call to write > > a particular page should only be used when we really need to write that > > particular page? > > Note that writepage() doesn't get used much. Most VM-initiated > filesystem writeback activity is via try_to_release_page(), which > has somewhat more vague and flexible semantics. We may want to change this though, or at the very least get rid of the horrible interplay between ->writepage and try_to_release_page() ... regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [Ext2-devel] Re: Possible Idea with filesystem buffering. 2002-01-22 19:03 ` Rik van Riel @ 2002-01-23 20:35 ` Stephen C. Tweedie 2002-01-23 20:48 ` Hans Reiser ` (2 more replies) 0 siblings, 3 replies; 92+ messages in thread From: Stephen C. Tweedie @ 2002-01-23 20:35 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Morton, Hans Reiser, Andreas Dilger, Chris Mason, Shawn Starr, linux-kernel, ext2-devel, Stephen Tweedie Hi, On Tue, Jan 22, 2002 at 05:03:02PM -0200, Rik van Riel wrote: > On Tue, 22 Jan 2002, Andrew Morton wrote: > > Hans Reiser wrote: > > > > Note that writepage() doesn't get used much. Most VM-initiated > > filesystem writeback activity is via try_to_release_page(), which > > has somewhat more vague and flexible semantics. > > We may want to change this though, or at the very least get > rid of the horrible interplay between ->writepage and > try_to_release_page() ... This is actually really important --- writepage on its own cannot distinguish between requests to flush something to disk (eg. msync or fsync), and requests to evict dirty data from memory. This is really important for ext3's data journaling mode --- syncing to disk only requires flushing as far as the journal, but evicting dirty pages requires a full writeback too. That's one place where our traditional VM notion of writepage just isn't quite fine-grained enough. Cheers, Stephen ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [Ext2-devel] Re: Possible Idea with filesystem buffering. 2002-01-23 20:35 ` [Ext2-devel] " Stephen C. Tweedie @ 2002-01-23 20:48 ` Hans Reiser 2002-01-23 20:55 ` Andrew Morton 2002-01-23 23:53 ` Hugh Dickins 2 siblings, 0 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-23 20:48 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Rik van Riel, Andrew Morton, Andreas Dilger, Chris Mason, Shawn Starr, linux-kernel, ext2-devel Stephen C. Tweedie wrote: >Hi, > >On Tue, Jan 22, 2002 at 05:03:02PM -0200, Rik van Riel wrote: > >>On Tue, 22 Jan 2002, Andrew Morton wrote: >> >>>Hans Reiser wrote: >>> >>>Note that writepage() doesn't get used much. Most VM-initiated >>>filesystem writeback activity is via try_to_release_page(), which >>>has somewhat more vague and flexible semantics. >>> >>We may want to change this though, or at the very least get >>rid of the horrible interplay between ->writepage and >>try_to_release_page() ... >> > >This is actually really important --- writepage on its own cannot >distinguish between requests to flush something to disk (eg. msync or >fsync), and requests to evict dirty data from memory. > >This is really important for ext3's data journaling mode --- syncing >to disk only requires flushing as far as the journal, but evicting >dirty pages requires a full writeback too. That's one place where our >traditional VM notion of writepage just isn't quite fine-grained >enough. > >Cheers, > Stephen > > I think this is a good point Stephen is making. So we have: * write this particular page at this particular memory address (for DMA setup or other reasons). * write the data on this page * apply X units of aging pressure to the subcache if it is distinct from the general cache and supports a pressure operation. as the three distinct needs we are needing to serve in the design of the interface. Rik, are you comfortable now with this cache plugin approach I am advocating now that I have explained it is motivated by the need to handle objects that are not flushed in pages? You have had another day to think about it, and you didn't quite say yes (though it did seem you no longer think me crazy). Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [Ext2-devel] Re: Possible Idea with filesystem buffering. 2002-01-23 20:35 ` [Ext2-devel] " Stephen C. Tweedie 2002-01-23 20:48 ` Hans Reiser @ 2002-01-23 20:55 ` Andrew Morton 2002-01-23 23:53 ` Hugh Dickins 2 siblings, 0 replies; 92+ messages in thread From: Andrew Morton @ 2002-01-23 20:55 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Rik van Riel, Hans Reiser, Andreas Dilger, Chris Mason, Shawn Starr, linux-kernel, ext2-devel "Stephen C. Tweedie" wrote: > > Hi, > > On Tue, Jan 22, 2002 at 05:03:02PM -0200, Rik van Riel wrote: > > On Tue, 22 Jan 2002, Andrew Morton wrote: > > > Hans Reiser wrote: > > > > > > Note that writepage() doesn't get used much. Most VM-initiated > > > filesystem writeback activity is via try_to_release_page(), which > > > has somewhat more vague and flexible semantics. > > > > We may want to change this though, or at the very least get > > rid of the horrible interplay between ->writepage and > > try_to_release_page() ... > > This is actually really important --- writepage on its own cannot > distinguish between requests to flush something to disk (eg. msync or > fsync), and requests to evict dirty data from memory. > > This is really important for ext3's data journaling mode --- syncing > to disk only requires flushing as far as the journal, but evicting > dirty pages requires a full writeback too. That's one place where our > traditional VM notion of writepage just isn't quite fine-grained > enough. And we use currently use PF_MEMALLOC to work out which context we're being called from. Sigh. I wish I'd taken better notes of all the square pegs which ext3 had to push into the kernel's round holes. But there were so many :) - ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [Ext2-devel] Re: Possible Idea with filesystem buffering. 2002-01-23 20:35 ` [Ext2-devel] " Stephen C. Tweedie 2002-01-23 20:48 ` Hans Reiser 2002-01-23 20:55 ` Andrew Morton @ 2002-01-23 23:53 ` Hugh Dickins 2002-01-24 0:01 ` Jeff Garzik 2 siblings, 1 reply; 92+ messages in thread From: Hugh Dickins @ 2002-01-23 23:53 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Rik van Riel, Andrew Morton, Hans Reiser, Andreas Dilger, Chris Mason, Shawn Starr, linux-kernel, ext2-devel On Wed, 23 Jan 2002, Stephen C. Tweedie wrote: > > This is actually really important --- writepage on its own cannot > distinguish between requests to flush something to disk (eg. msync or > fsync), and requests to evict dirty data from memory. Actually, that much can now be distinguished: PageLaunder(page) when evicting from memory, !PageLaunder(page) when msync or fsync. Hugh ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [Ext2-devel] Re: Possible Idea with filesystem buffering. 2002-01-23 23:53 ` Hugh Dickins @ 2002-01-24 0:01 ` Jeff Garzik 0 siblings, 0 replies; 92+ messages in thread From: Jeff Garzik @ 2002-01-24 0:01 UTC (permalink / raw) To: Hugh Dickins Cc: Stephen C. Tweedie, Rik van Riel, Andrew Morton, Hans Reiser, Andreas Dilger, Chris Mason, Shawn Starr, linux-kernel, ext2-devel Hugh Dickins wrote: > > On Wed, 23 Jan 2002, Stephen C. Tweedie wrote: > > > > This is actually really important --- writepage on its own cannot > > distinguish between requests to flush something to disk (eg. msync or > > fsync), and requests to evict dirty data from memory. > > Actually, that much can now be distinguished: > PageLaunder(page) when evicting from memory, > !PageLaunder(page) when msync or fsync. Nifty! Thanks for pointing this out. -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 18:41 ` Andrew Morton 2002-01-22 19:03 ` Rik van Riel @ 2002-01-22 20:19 ` Hans Reiser 2002-01-22 20:50 ` Rik van Riel 1 sibling, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-22 20:19 UTC (permalink / raw) To: Andrew Morton Cc: Andreas Dilger, Chris Mason, Rik van Riel, Shawn Starr, linux-kernel, ext2-devel Andrew Morton wrote: >Hans Reiser wrote: > >>So is there a consensus view that we need 2 calls, one to write a >>particular page, and one to exert memory pressure, and the call to write >>a particular page should only be used when we really need to write that >>particular page? >> > >Note that writepage() doesn't get used much. Most VM-initiated >filesystem writeback activity is via try_to_release_page(), which >has somewhat more vague and flexible semantics. > >And by bdflush, which I suspect tends to conflict with sync_page_buffers() >under pressure. But that's a different problem. > >- > > So the problem is that there is no coherently architected VM-to-FS interface that has been articulated, and we need one. So far we can identify that we need something to pressure the FS, and something to ask for a particular page. It might be desirable to pressure the FS more than one page aging at a time for reasons of performance as Rik pointed out. Any other design considerations? ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 20:19 ` Hans Reiser @ 2002-01-22 20:50 ` Rik van Riel 0 siblings, 0 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-22 20:50 UTC (permalink / raw) To: Hans Reiser Cc: Andrew Morton, Andreas Dilger, Chris Mason, Shawn Starr, linux-kernel, ext2-devel On Tue, 22 Jan 2002, Hans Reiser wrote: > So the problem is that there is no coherently architected VM-to-FS > interface that has been articulated, and we need one. Absolutely agreed. One of the main design elements for such an interface would be doing all filesystem things in the filesystem and all VM things in the VM so we don't get frankenstein monsters on either side of the fence. > So far we can identify that we need something to pressure the FS, and > something to ask for a particular page. > > It might be desirable to pressure the FS more than one page aging at a > time for reasons of performance as Rik pointed out. > Any other design considerations? One of the things we really want to do in the VM is pre-clean data and just reclaim clean pages later on. This means it would be easiest/best if the filesystem took care of _just_ writing out data and if freeing the data later on would be left to the VM. I understand this is not always possible due to stuff like metadata repacking, but I guess we can ignore this case for now since the metadata is hopefully small and won't unbalance the VM. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 6:02 ` Andreas Dilger 2002-01-22 10:09 ` Tommi Kyntola 2002-01-22 11:39 ` Hans Reiser @ 2002-01-22 14:03 ` Chris Mason 2002-01-22 14:39 ` Rik van Riel 2 siblings, 1 reply; 92+ messages in thread From: Chris Mason @ 2002-01-22 14:03 UTC (permalink / raw) To: Andreas Dilger Cc: Hans Reiser, Rik van Riel, Shawn Starr, linux-kernel, ext2-devel On Monday, January 21, 2002 11:02:49 PM -0700 Andreas Dilger <adilger@turbolabs.com> wrote: [ snip ] It seems like the basic features we are suggesting are very close, I'll try one last time to make a case against the 'free_some_pages' call ;-) > > The VM can make better _specific_ judgements when it needs to (e.g. free > a DMA page or another specific page to allow a larger contiguous chunk of > memory to be allocated), but in the cases where it just wants _some_ > page(s) to be freed, it should allow the FS to decide which one(s), if it > cares. I'd rather see the VM trigger a flush on a specific page, but tell the FS it's OK to do broader actions if it wants to. In the case of write throttling, the FS doesn't know which page has been dirty the longest, unless it starts maintaining its own lists. The VM has all that information, so it kicks the throttle or periodic write off with one buffer, and lets the FS trigger other events because we aren't under huge memory load. The FS doesn't know how long a page has been dirty, or how often it gets used, or anything other than this page is pinned and waiting for X event to take place. If we really can't get this info to the VM in a useful fashion, that's one thing. But if we can clue the VM in a little and put the decision making there, I think the end result will be more likely to clean the right page. That does affect performance even when we're not under heavy memory pressure. -chris ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 14:03 ` Chris Mason @ 2002-01-22 14:39 ` Rik van Riel 2002-01-22 18:46 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-22 14:39 UTC (permalink / raw) To: Chris Mason Cc: Andreas Dilger, Hans Reiser, Shawn Starr, linux-kernel, ext2-devel On Tue, 22 Jan 2002, Chris Mason wrote: > It seems like the basic features we are suggesting are very close, I'll try > one last time to make a case against the 'free_some_pages' call ;-) > The FS doesn't know how long a page has been dirty, or how often it > gets used, In an efficient system, the FS will never get to know this, either. The whole idea behind the VFS and the VM is that calls to the FS are avoided as much as possible, in order to keep the system fast. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 14:39 ` Rik van Riel @ 2002-01-22 18:46 ` Hans Reiser 2002-01-22 19:19 ` Chris Mason 2002-01-22 20:20 ` Rik van Riel 0 siblings, 2 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-22 18:46 UTC (permalink / raw) To: Rik van Riel Cc: Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel Rik van Riel wrote: >On Tue, 22 Jan 2002, Chris Mason wrote: > >>It seems like the basic features we are suggesting are very close, I'll try >>one last time to make a case against the 'free_some_pages' call ;-) >> > >>The FS doesn't know how long a page has been dirty, or how often it >>gets used, >> > >In an efficient system, the FS will never get to know this, either. > I don't understand this statement. If dereferencing a vfs op for every page aging is too expensive, then ask it to age more than one page at a time. Or do I miss your meaning? > > >The whole idea behind the VFS and the VM is that calls to the FS >are avoided as much as possible, in order to keep the system fast. > In other words, you write the core of our filesystem for us, and we write the parts that don't interest you? Maybe this is the real meat of the issue? Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 18:46 ` Hans Reiser @ 2002-01-22 19:19 ` Chris Mason 2002-01-22 20:13 ` Steve Lord 2002-01-22 20:32 ` Hans Reiser 2002-01-22 20:20 ` Rik van Riel 1 sibling, 2 replies; 92+ messages in thread From: Chris Mason @ 2002-01-22 19:19 UTC (permalink / raw) To: Hans Reiser, Rik van Riel Cc: Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel On Tuesday, January 22, 2002 09:46:07 PM +0300 Hans Reiser <reiser@namesys.com> wrote: > Rik van Riel wrote: >>> The FS doesn't know how long a page has been dirty, or how often it >>> gets used, >> In an efficient system, the FS will never get to know this, either. > > I don't understand this statement. If dereferencing a vfs op for every > page aging is too expensive, then ask it to age more than one page at a > time. Or do I miss your meaning? Its not about the cost of a function call, it's what the FS does to make that call useful. Pretend for a second the VM tells the FS everything it needs to know to age a page (whatever scheme the FS wants to use). Then pretend the VM decides there's memory pressure, and tells the FS subcache to start freeing ram. So, the FS goes through its list of pages and finds the most suitable one for flushing, but it has no idea how suitable that page is in comparison with the pages that don't belong to that FS (or even other pages from different mount points of the same FS flavor). Since each subcache has its own aging scheme, you can't look at a page from subcache A and compare it with a page from subcache B. All the filesystem can do is flush its own pages, which might be the least suitable pages on the entire box. The VM has no way of knowing, and neither does the FS, and that's why its inefficient. Please let me know if I misunderstood the original plan ;-) -chris ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 19:19 ` Chris Mason @ 2002-01-22 20:13 ` Steve Lord 2002-01-22 21:22 ` Chris Mason 2002-01-22 20:32 ` Hans Reiser 1 sibling, 1 reply; 92+ messages in thread From: Steve Lord @ 2002-01-22 20:13 UTC (permalink / raw) To: Chris Mason Cc: Hans Reiser, Rik van Riel, Andreas Dilger, Shawn Starr, Linux Kernel, ext2-devel On Tue, 2002-01-22 at 13:19, Chris Mason wrote: > > > On Tuesday, January 22, 2002 09:46:07 PM +0300 Hans Reiser > <reiser@namesys.com> wrote: > > > Rik van Riel wrote: > >>> The FS doesn't know how long a page has been dirty, or how often it > >>> gets used, > >> In an efficient system, the FS will never get to know this, either. > > > > I don't understand this statement. If dereferencing a vfs op for every > > page aging is too expensive, then ask it to age more than one page at a > > time. Or do I miss your meaning? > > Its not about the cost of a function call, it's what the FS does to make > that call useful. Pretend for a second the VM tells the FS everything it > needs to know to age a page (whatever scheme the FS wants to use). > > Then pretend the VM decides there's memory pressure, and tells the FS > subcache to start freeing ram. So, the FS goes through its list of pages > and finds the most suitable one for flushing, but it has no idea how > suitable that page is in comparison with the pages that don't belong to > that FS (or even other pages from different mount points of the same FS > flavor). > > Since each subcache has its own aging scheme, you can't look at a page from > subcache A and compare it with a page from subcache B. > > All the filesystem can do is flush its own pages, which might be the least > suitable pages on the entire box. The VM has no way of knowing, and > neither does the FS, and that's why its inefficient. > > Please let me know if I misunderstood the original plan ;-) > Looks like I've been missing an interesting thread here .... Surely flushing pages (and hence cleaning them) is not a bad thing to do, provided you do not suck up all the available I/O bandwidth in the process. The filesystem decides to clean the pages as it is efficient from an I/O point of view. The vm is then free to reuse lots of pages it could not before, but it still gets to make the decision about the pages being good ones to reuse. The xfs kernel changes add a call to writepage into the buffer flushing path when the data is delayed allocate. We then end up issuing I/O on surrounding pages which end up being contiguous on disk and are not currently locked by some other thread. Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 20:13 ` Steve Lord @ 2002-01-22 21:22 ` Chris Mason 0 siblings, 0 replies; 92+ messages in thread From: Chris Mason @ 2002-01-22 21:22 UTC (permalink / raw) To: Steve Lord Cc: Hans Reiser, Rik van Riel, Andreas Dilger, Shawn Starr, Linux Kernel, ext2-devel On Tuesday, January 22, 2002 02:13:18 PM -0600 Steve Lord <lord@sgi.com> wrote: > Looks like I've been missing an interesting thread here .... Hi Steve ;-) > > Surely flushing pages (and hence cleaning them) is not a bad thing to > do, provided you do not suck up all the available I/O bandwidth in the > process. The filesystem decides to clean the pages as it is efficient > from an I/O point of view. The vm is then free to reuse lots of pages > it could not before, but it still gets to make the decision about the > pages being good ones to reuse. Very true, there are a few different workloads to consider. 1) The box really needs ram right now, and we should do the minimum amount of work to get it done. This is usually done by kswapd or a process doing an allocation. It should help if the FS gives the VM enough details to skip pages that require extra allocations (like commit blocks) in favor of less expensive ones. 2) There's lots of dirty pages around, it would be a good idea to flush some, regardless of how many pages might be freeable afterwards. This is where we want most of the i/o to actually happen, and where we want to give the FS the most freedom in regards to which pages get written. > > The xfs kernel changes add a call to writepage into the buffer flushing > path when the data is delayed allocate. We then end up issuing I/O on > surrounding pages which end up being contiguous on disk and are not > currently locked by some other thread. This probably helps in both situations listed, assuming things like HIGHMEM bounce buffers don't come into play. -chris ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 19:19 ` Chris Mason 2002-01-22 20:13 ` Steve Lord @ 2002-01-22 20:32 ` Hans Reiser 2002-01-22 21:08 ` Chris Mason 2002-01-22 21:12 ` Rik van Riel 1 sibling, 2 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-22 20:32 UTC (permalink / raw) To: Chris Mason Cc: Rik van Riel, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel Chris Mason wrote: > >On Tuesday, January 22, 2002 09:46:07 PM +0300 Hans Reiser ><reiser@namesys.com> wrote: > >>Rik van Riel wrote: >> >>>>The FS doesn't know how long a page has been dirty, or how often it >>>>gets used, >>>> >>>In an efficient system, the FS will never get to know this, either. >>> >>I don't understand this statement. If dereferencing a vfs op for every >>page aging is too expensive, then ask it to age more than one page at a >>time. Or do I miss your meaning? >> > >Its not about the cost of a function call, it's what the FS does to make >that call useful. Pretend for a second the VM tells the FS everything it >needs to know to age a page (whatever scheme the FS wants to use). > >Then pretend the VM decides there's memory pressure, and tells the FS >subcache to start freeing ram. So, the FS goes through its list of pages >and finds the most suitable one for flushing, but it has no idea how >suitable that page is in comparison with the pages that don't belong to >that FS (or even other pages from different mount points of the same FS >flavor). > Why does it need to know how suitable it is compared to the other subcaches? It just ages X pages, and depends on the VM to determine how large X is. The VM pressures subcaches in proportion to their size, it doesn't need to know how suitable one page is compared to another, it just has a notion of push on everyone in proportion to their size. > > >Since each subcache has its own aging scheme, you can't look at a page from >subcache A and compare it with a page from subcache B. > Chris, the VM doesn't compare one page to another within a unified cache, so why should it compare one page to another within the delegated cache management scheme? The VM ages until it gets what it wants, in the current scheme. In the scheme I propose it requests aging from the subcaches until it gets what it wants, instead of doing aging until it gets what it wants. Note that there is some slight inaccuracy in this, in that the current scheme has ordered lists, but my point remains valid, especially if we move to aging based on usage minus age counts, which I think Rik may be supportive of do (it makes it easier to give less staying power to a page that is read only once, and I would say it was Rik's idea except that I have probably distorted it in repeating it). > > >All the filesystem can do is flush its own pages, which might be the least >suitable pages on the entire box. The VM has no way of knowing, and >neither does the FS, and that's why its inefficient. > >Please let me know if I misunderstood the original plan ;-) > Thanks for pointing out what needed to be articulated. Is it more clear now? Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 20:32 ` Hans Reiser @ 2002-01-22 21:08 ` Chris Mason 2002-01-22 22:05 ` Hans Reiser ` (3 more replies) 2002-01-22 21:12 ` Rik van Riel 1 sibling, 4 replies; 92+ messages in thread From: Chris Mason @ 2002-01-22 21:08 UTC (permalink / raw) To: Hans Reiser Cc: Rik van Riel, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel On Tuesday, January 22, 2002 11:32:09 PM +0300 Hans Reiser <reiser@namesys.com> wrote: >> Its not about the cost of a function call, it's what the FS does to make >> that call useful. Pretend for a second the VM tells the FS everything it >> needs to know to age a page (whatever scheme the FS wants to use). >> >> Then pretend the VM decides there's memory pressure, and tells the FS >> subcache to start freeing ram. So, the FS goes through its list of pages >> and finds the most suitable one for flushing, but it has no idea how >> suitable that page is in comparison with the pages that don't belong to >> that FS (or even other pages from different mount points of the same FS >> flavor). >> > > Why does it need to know how suitable it is compared to the other > subcaches? It just ages X pages, and depends on the VM to determine how > large X is. The VM pressures subcaches in proportion to their size, it > doesn't need to know how suitable one page is compared to another, it > just has a notion of push on everyone in proportion to their size. If subcache A has 1000 pages that are very very active, and subcache B has 500 pages that never ever get used, should A get twice as much memory pressure? That's what we want to avoid, and I don't see how subcaches allow it. -chris ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 21:08 ` Chris Mason @ 2002-01-22 22:05 ` Hans Reiser 2002-01-22 22:21 ` Rik van Riel 2002-01-22 22:10 ` Richard B. Johnson ` (2 subsequent siblings) 3 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-22 22:05 UTC (permalink / raw) To: Chris Mason Cc: Rik van Riel, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel Chris Mason wrote: > >On Tuesday, January 22, 2002 11:32:09 PM +0300 Hans Reiser ><reiser@namesys.com> wrote: > >>>Its not about the cost of a function call, it's what the FS does to make >>>that call useful. Pretend for a second the VM tells the FS everything it >>>needs to know to age a page (whatever scheme the FS wants to use). >>> >>>Then pretend the VM decides there's memory pressure, and tells the FS >>>subcache to start freeing ram. So, the FS goes through its list of pages >>>and finds the most suitable one for flushing, but it has no idea how >>>suitable that page is in comparison with the pages that don't belong to >>>that FS (or even other pages from different mount points of the same FS >>>flavor). >>> >>Why does it need to know how suitable it is compared to the other >>subcaches? It just ages X pages, and depends on the VM to determine how >>large X is. The VM pressures subcaches in proportion to their size, it >>doesn't need to know how suitable one page is compared to another, it >>just has a notion of push on everyone in proportion to their size. >> > >If subcache A has 1000 pages that are very very active, and subcache B has >500 pages that never ever get used, should A get twice as much memory >pressure? That's what we want to avoid, and I don't see how subcaches >allow it. > >-chris > > > > Yes, it should get twice as much pressure, but that does not mean it should free twice as many pages, it means it should age twice as many pages, and then the accesses will un-age them. Make more sense now? Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 22:05 ` Hans Reiser @ 2002-01-22 22:21 ` Rik van Riel 2002-01-23 0:16 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-22 22:21 UTC (permalink / raw) To: Hans Reiser Cc: Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel On Wed, 23 Jan 2002, Hans Reiser wrote: > Yes, it should get twice as much pressure, but that does not mean it > should free twice as many pages, it means it should age twice as many > pages, and then the accesses will un-age them. > > Make more sense now? So basically you are saying that each filesystem should implement the code to age all pages equally and react equally to memory pressure ... ... essentially duplicating what the current VM already does! regads, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 22:21 ` Rik van Riel @ 2002-01-23 0:16 ` Hans Reiser 0 siblings, 0 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-23 0:16 UTC (permalink / raw) To: Rik van Riel Cc: Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel Rik van Riel wrote: >On Wed, 23 Jan 2002, Hans Reiser wrote: > >>Yes, it should get twice as much pressure, but that does not mean it >>should free twice as many pages, it means it should age twice as many >>pages, and then the accesses will un-age them. >> >>Make more sense now? >> > >So basically you are saying that each filesystem should >implement the code to age all pages equally and react >equally to memory pressure ... > >... essentially duplicating what the current VM already >does! > >regads, > >Rik > If the object appropriate for the subcache is either larger (reiser4 slums), or smaller (have to reread that code to remember whether dentries can reasonably be coded to be squeezed over to other pages, I think so, if yes then they are an example of smaller, maybe someone can say something on this) than a page, then you ought to age objects with a granularity other than that of a page. You can express the aging in units of pages (and the subcache can convert the units), but the aging should be applied in units of the object being cached. Just to confuse things, there are middle ground solutions as well. For instance, reiser4 slums are variable size, and can even have maximums if we want it. If we are lazy coders (and we might be), we could even choose to track aging at page granularity, and be just like the generic VM code, except for the final flush moment when we will consider flushing 64 nodes to disk to count as 64 agings that our cache yielded up as its fair share. With regards to that last sentence, I need more time to think about whether that is really reasonably optimal to do and simpler to code. Consider an analogy with reiser4 plugins. One of my constant battles is that my programmers want to take all the code that they think most plugins will have to do, and force all plugin authors to do it that way by not making the mostly common code part of the generic plugin templates. The right way to do it is to create generic templates, let the plugin authors add their couple of function calls that are unique to their plugin to the generic template code, and get them to use the generic template for reasons of convenience not compulsion. I am asking you to create a cache plugin architecture for VM. It will be cool, people will use it for all sorts of weird and useful optimizations of obscure but important to someone caches (maybe even dcache if nothing prevents relocating dcache entries, wish I could remember), trust me.:) It is probably more important to caches other than ReiserFS that there be this kind of architecture (we could survive the reduction in optimality from flushing more than our fair share, it wouldn't kill us, but I like to ask for the right design on principle, and I think that for other caches it really will matter. It is also possible that some future ReiserFS I don't yet imagine will more significantly benefit from such a right design.) Ok, so it seems we are it seems much less far apart now than we were previously.:) I remain curious about what dinner cooked by you using fresh Brazilian ingredients tastes like. The tantalizing thought still lurks in the back of my mind where you planted it.:) I MUST generate a business requirement for going to Brazil.....:-) Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 21:08 ` Chris Mason 2002-01-22 22:05 ` Hans Reiser @ 2002-01-22 22:10 ` Richard B. Johnson 2002-01-23 1:14 ` Stuart Young 2002-01-23 17:16 ` Daniel Phillips 3 siblings, 0 replies; 92+ messages in thread From: Richard B. Johnson @ 2002-01-22 22:10 UTC (permalink / raw) To: Chris Mason Cc: Hans Reiser, Rik van Riel, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel What's wrong with having the file-system call a VM function to free some buffer once it's been written and hasn't been accessed recently? Isn't that what's being done already. That keeps FS in the FS and VM in VM. The file-system is the only thing that "knows" or should know about file-system activity. The only problem I see with the current implementation is that it "seems as though" the file-system keeps old data too long. Therefore, RAM gets short. The actual buffer(s) that get written and then released should be based upon "least-recently-used". Buffers should be written until some target of free memory is reached. Presently it doesn't seem as though we have such a target. Therefore, we eventually run out of RAM and try to find some magic algorithm to use. As a last resort, we kill processes. This is NotGood(tm). We need a free-RAM target, possibly based upon a percentage of available RAM. The lack of such a target is what causes the out-of-RAM condition we have been experiencing. Somebody thought that "free RAM is wasted RAM" and the VM has been based upon that theory. That theory has been proven incorrect. You need free RAM, just like you need "excess horsepower" to make automobiles drivable. That free RAM is the needed "rubber-band" to absorb the dynamics of real-world systems. That free-RAM target can be attacked both by the file-system(s) and the VM system. The file-system gives LRU buffers until it has obtained the free-RAM target, without regard for the fact that VM may immediately use those pages for process expansion. VM will also give up LRU pages until it has reached the same target. These targets occur at different times, which is the exact mechanism necessary to load-balance available RAM. VM can write to swap if it needs, to satisfy its free-RAM target but writing to swap has to go directly to the device or you will oscillate if the swap-write doesn't free its buffers. In other words, you don't free cache-RAM by writing to a cached file-system. You will eventually settle into the time-constant which causes oscillation. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips). I was going to compile a list of innovations that could be attributed to Microsoft. Once I realized that Ctrl-Alt-Del was handled in the BIOS, I found that there aren't any. ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 21:08 ` Chris Mason 2002-01-22 22:05 ` Hans Reiser 2002-01-22 22:10 ` Richard B. Johnson @ 2002-01-23 1:14 ` Stuart Young 2002-01-23 17:16 ` Daniel Phillips 3 siblings, 0 replies; 92+ messages in thread From: Stuart Young @ 2002-01-23 1:14 UTC (permalink / raw) To: linux-kernel Cc: Hans Reiser, Rik van Riel, Andreas Dilger, Shawn Starr, root, Chris Mason At 05:10 PM 22/01/02 -0500, Richard B. Johnson wrote: >We need a free-RAM target, possibly based upon a percentage of >available RAM. The lack of such a target is what causes the >out-of-RAM condition we have been experiencing. Somebody thought >that "free RAM is wasted RAM" and the VM has been based upon >that theory. That theory has been proven incorrect. You need >free RAM, just like you need "excess horsepower" to make >automobiles drivable. That free RAM is the needed "rubber-band" >to absorb the dynamics of real-world systems. It'd be nice if this cache high/low watermark was adjustable, preferably through say the sysctl interface, on a running kernel. This would mean that a competent system administrator could tune the system to their needs. A decent runscript for a particular program (I'm assuming run as root here) could adjust the value to absorb the dynamics of a particular program. Stuart Young - sgy@amc.com.au (aka Cefiar) - cefiar1@optushome.com.au [All opinions expressed in the above message are my] [own and not necessarily the views of my employer..] ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 21:08 ` Chris Mason ` (2 preceding siblings ...) 2002-01-23 1:14 ` Stuart Young @ 2002-01-23 17:16 ` Daniel Phillips 3 siblings, 0 replies; 92+ messages in thread From: Daniel Phillips @ 2002-01-23 17:16 UTC (permalink / raw) To: Chris Mason, Hans Reiser Cc: Rik van Riel, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel On January 22, 2002 10:08 pm, Chris Mason wrote: > On Tuesday, January 22, 2002 11:32:09 PM +0300 Hans Reiser wrote: > >> Its not about the cost of a function call, it's what the FS does to make > >> that call useful. Pretend for a second the VM tells the FS everything it > >> needs to know to age a page (whatever scheme the FS wants to use). > >> > >> Then pretend the VM decides there's memory pressure, and tells the FS > >> subcache to start freeing ram. So, the FS goes through its list of pages > >> and finds the most suitable one for flushing, but it has no idea how > >> suitable that page is in comparison with the pages that don't belong to > >> that FS (or even other pages from different mount points of the same FS > >> flavor). > > > > Why does it need to know how suitable it is compared to the other > > subcaches? It just ages X pages, and depends on the VM to determine how > > large X is. The VM pressures subcaches in proportion to their size, it > > doesn't need to know how suitable one page is compared to another, it > > just has a notion of push on everyone in proportion to their size. > > If subcache A has 1000 pages that are very very active, and subcache B has > 500 pages that never ever get used, should A get twice as much memory > pressure? That's what we want to avoid, and I don't see how subcaches > allow it. This question at least is not difficult. Pressure (for writeout) should be applied to each subcache in proportion to its portion of all inactive, dirty pages in the system. -- Daniel ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 20:32 ` Hans Reiser 2002-01-22 21:08 ` Chris Mason @ 2002-01-22 21:12 ` Rik van Riel 2002-01-22 21:28 ` Shawn Starr 1 sibling, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-22 21:12 UTC (permalink / raw) To: Hans Reiser Cc: Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel On Tue, 22 Jan 2002, Hans Reiser wrote: > Why does it need to know how suitable it is compared to the other > subcaches? It just ages X pages, How the hell is the filesystem supposed to age pages ? The filesystem DOES NOT KNOW how often pages are used, so it cannot age the pages. End of thread. Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 21:12 ` Rik van Riel @ 2002-01-22 21:28 ` Shawn Starr 2002-01-22 21:31 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Shawn Starr @ 2002-01-22 21:28 UTC (permalink / raw) To: Rik van Riel Cc: Hans Reiser, Chris Mason, Andreas Dilger, linux-kernel, ext2-devel I've started on writing a pagebuf daemon (experimenting with ramfs). It will have the VM manage the allocating/freeing of pages. The filesystem should not have to know when a page needs to be freed or allocated. It just need pages. The pagebuf is supposed to age pages not the filesystem. Shawn. On Tue, 2002-01-22 at 16:12, Rik van Riel wrote: > On Tue, 22 Jan 2002, Hans Reiser wrote: > > > Why does it need to know how suitable it is compared to the other > > subcaches? It just ages X pages, > > How the hell is the filesystem supposed to age pages ? > > The filesystem DOES NOT KNOW how often pages are used, > so it cannot age the pages. > > End of thread. > > Rik > -- > "Linux holds advantages over the single-vendor commercial OS" > -- Microsoft's "Competing with Linux" document > > http://www.surriel.com/ http://distro.conectiva.com/ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 21:28 ` Shawn Starr @ 2002-01-22 21:31 ` Rik van Riel 0 siblings, 0 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-22 21:31 UTC (permalink / raw) To: Shawn Starr Cc: Hans Reiser, Chris Mason, Andreas Dilger, linux-kernel, ext2-devel On 22 Jan 2002, Shawn Starr wrote: > I've started on writing a pagebuf daemon (experimenting with ramfs). > It will have the VM manage the allocating/freeing of pages. The > filesystem should not have to know when a page needs to be freed or > allocated. It just need pages. The pagebuf is supposed to age pages > not the filesystem. Last I looked it was try_to_free_pages() which does the aging of pages. What functionality would a pagebuf daemon add in this regard ? Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 18:46 ` Hans Reiser 2002-01-22 19:19 ` Chris Mason @ 2002-01-22 20:20 ` Rik van Riel 2002-01-22 22:31 ` Hans Reiser 2002-01-23 17:15 ` Josh MacDonald 1 sibling, 2 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-22 20:20 UTC (permalink / raw) To: Hans Reiser Cc: Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel On Tue, 22 Jan 2002, Hans Reiser wrote: > Rik van Riel wrote: > >On Tue, 22 Jan 2002, Chris Mason wrote: > >>The FS doesn't know how long a page has been dirty, or how often it > >>gets used, > > > >In an efficient system, the FS will never get to know this, either. > > I don't understand this statement. If dereferencing a vfs op for > every page aging is too expensive, then ask it to age more than one > page at a time. Or do I miss your meaning? Please repeat after me: "THE FS DOES NOT SEE THE MMU ACCESSED BITS" Also, if a piece of data is in the page cache, it is accessed without calling the filesystem code. This means the filesystem doesn't know how often pages are or are not used, hence it cannot make the decisions the VM make. Or do you want to have your own ReiserVM and ReiserPageCache ? regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 20:20 ` Rik van Riel @ 2002-01-22 22:31 ` Hans Reiser 2002-01-22 23:34 ` Rik van Riel 2002-01-23 17:15 ` Josh MacDonald 1 sibling, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-22 22:31 UTC (permalink / raw) To: Rik van Riel Cc: Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel Let's try a non-reiserfs sub-cache example. Suppose you have a cache of objects that are smaller than a page. These might be dcache entries, these might be struct inodes, these might be all sorts of things in the kernel. Suppose that there is absolutely no correlation in access between the objects that are on the same page. Suppose that this subcache has methods for freeing however many of them it wants to free, and it can squeeze them together into fewer pages whenever it wants to. Suppose it can track accesses to the objects, and it could age them also, if we wrote the code to do it. If we age with page granularity as you ask us to, we are doing fundamentally the wrong thing. Aging with page granularity means that we keep in the cache every object that happens to land on the same page with a frequently accessed object even if those objects are never accessed again ever. Another wrong way: Ok, so suppose we have methods for shrinking the cache ala the old 2.2 dcache shrinking code. Suppose we invoke those whenever the cache gets "too large", or the other caches are failing to free pages because things have gotten SO pathologically inbalanced that they have nothing they can free. This is also bad. It results in unbalanced caches, and makes our VM maintainer think that subcaches are inherently bad. If we don't have a master VM pushing proportionally to their size on all subcaches, and telling them how many pages worth of aging to apply, we either have unused objects staying in memory because they happen to land on a page with a frequently used object, or we have unbalanced caches that know what to free but not how much to free. We need a master VM that says how much aging pressure to apply, and subcaches that respond to that. We need a VM that doesn't just delegate, but delegates skillfully enough that the subcaches know what they need to know to act on it. Hans Rik van Riel wrote: >On Tue, 22 Jan 2002, Hans Reiser wrote: > >>Rik van Riel wrote: >> >>>On Tue, 22 Jan 2002, Chris Mason wrote: >>> > >>>>The FS doesn't know how long a page has been dirty, or how often it >>>>gets used, >>>> >>>In an efficient system, the FS will never get to know this, either. >>> >>I don't understand this statement. If dereferencing a vfs op for >>every page aging is too expensive, then ask it to age more than one >>page at a time. Or do I miss your meaning? >> > >Please repeat after me: > > "THE FS DOES NOT SEE THE MMU ACCESSED BITS" > We can't borrow whatever pair of glasses the master VM is using? > > >Also, if a piece of data is in the page cache, it is accessed >without calling the filesystem code. > > >This means the filesystem doesn't know how often pages are or >are not used, hence it cannot make the decisions the VM make. > >Or do you want to have your own ReiserVM and ReiserPageCache ? > >regards, > >Rik > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 22:31 ` Hans Reiser @ 2002-01-22 23:34 ` Rik van Riel 0 siblings, 0 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-22 23:34 UTC (permalink / raw) To: Hans Reiser Cc: Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel On Wed, 23 Jan 2002, Hans Reiser wrote: > Let's try a non-reiserfs sub-cache example. Suppose you have a cache > of objects that are smaller than a page. > Suppose that there is absolutely no correlation in access between the > objects that are on the same page. Suppose that this subcache has > methods for freeing however many of them it wants to free, and it can > squeeze them together into fewer pages whenever it wants to. In this case I absolutely agree with you. In this case it is also _possible_ because all access to these data structures goes through the filesystem code, so the filesystem knows exactly which object is a candidate for freeing and which isn't. I think the last messages from the thread were a miscommunication between us -- I was under the impression that you wanted per-filesystem freeing decisions for things like page cache pages. kind regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 20:20 ` Rik van Riel 2002-01-22 22:31 ` Hans Reiser @ 2002-01-23 17:15 ` Josh MacDonald 1 sibling, 0 replies; 92+ messages in thread From: Josh MacDonald @ 2002-01-23 17:15 UTC (permalink / raw) To: Rik van Riel Cc: Hans Reiser, Chris Mason, Andreas Dilger, Shawn Starr, linux-kernel, ext2-devel Quoting Rik van Riel (riel@conectiva.com.br): > On Tue, 22 Jan 2002, Hans Reiser wrote: > > Rik van Riel wrote: > > >On Tue, 22 Jan 2002, Chris Mason wrote: > > > >>The FS doesn't know how long a page has been dirty, or how often it > > >>gets used, > > > > > >In an efficient system, the FS will never get to know this, either. > > > > I don't understand this statement. If dereferencing a vfs op for > > every page aging is too expensive, then ask it to age more than one > > page at a time. Or do I miss your meaning? > > Please repeat after me: > > "THE FS DOES NOT SEE THE MMU ACCESSED BITS" > > Also, if a piece of data is in the page cache, it is accessed > without calling the filesystem code. > > > This means the filesystem doesn't know how often pages are or > are not used, hence it cannot make the decisions the VM make. > > Or do you want to have your own ReiserVM and ReiserPageCache ? Rik, We think there are good reasons for the FS to know when and how its data is accessed, although this issue is less significant than the semantics of writepage() being discussed in this thread. Referring to the transaction design document I posted several months ago: http://marc.theaimsgroup.com/?l=linux-kernel&m=100510090926874&w=2 Our intention is for the file system to be capable of tracking read and write data-dependencies so that it can safely defer writing batches of data and still guarantee consistent crash recovery from the application's point of view. The interaction between transactions and mmaped regions may leave something to be desired, and we may not need to know how often pages are or are not used, but we would like to know which pages are read and written by whom, even for the case of a page cache hit. -josh -- PRCS version control system http://sourceforge.net/projects/prcs Xdelta storage & transport http://sourceforge.net/projects/xdelta Need a concurrent skip list? http://sourceforge.net/projects/skiplist ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 23:11 ` Rik van Riel 2002-01-20 23:40 ` Shawn Starr @ 2002-01-21 0:28 ` Hans Reiser 2002-01-21 0:47 ` Rik van Riel 1 sibling, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 0:28 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Sun, 20 Jan 2002, Shawn Starr wrote: > >>But why should each filesystem have to have a different method of >>buffering/caching? that just doesn't fit the layered model of the >>kernel IMHO. >> > >I think Hans will give up the idea once he realises the >performance implications. ;) > >Rik > Rik, what reiser4 does is take a slum (a slum is a contiguous in the tree order set of dirty buffers), and just before flushing it to disk we squeeze the entire slum as far to the left as we can, and encrypt any parts of it that we need to encrypt, and assign block numbers to it. Tree balancing normally has a tradeoff between memory copies performed on average per insertion, and tightness in packing nodes. Squeezing in response to memory pressure greatly optimizes the the number of nodes we are packed into while only performing one memory copy just before flush time for that optimization. It is MUCH more efficient. Block allocation ala XFS can be much more optimal if done just before flushing. Encryption just before flushing rather than with every modification to a file is also much more efficient. Committing transactions also have a complex need to be memory pressure driven (complex enough that I won't describe it here). So, really, memory pressure needs to push a whole set of events in a well designed filesystem. Thinking that you can just pick a page and write it and write no other pages, all without understanding the optimizations of the filesystem you write to, is simplistic. Suppose we do what you ask, and always write the page (as well as some other pages) to disk. This will result in the filesystem cache as a whole receiving more pressure than other caches that only write one page in response to pressure. This is unbalanced, leads to some caches having shorter average page lifetimes than others, and it is therefor suboptimal. Yes? Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 0:28 ` Hans Reiser @ 2002-01-21 0:47 ` Rik van Riel 2002-01-21 1:01 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-21 0:47 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn Starr, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > Suppose we do what you ask, and always write the page (as well as some > other pages) to disk. This will result in the filesystem cache as a > whole receiving more pressure than other caches that only write one > page in response to pressure. This is unbalanced, leads to some > caches having shorter average page lifetimes than others, and it is > therefor suboptimal. Yes? If your ->writepage() writes pages to disk it just means that reiserfs will be able to clean its pages faster than the other filesystems. This means the VM will not call reiserfs ->writepage() as often as for the other filesystems, since more of the pages it finds will already be clean and freeable. I guess the only way to unbalance the caches is by actually freeing pages in ->writepage, but I don't see any real reason why you'd want to do that... regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 0:47 ` Rik van Riel @ 2002-01-21 1:01 ` Hans Reiser 2002-01-21 1:21 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 1:01 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>Suppose we do what you ask, and always write the page (as well as some >>other pages) to disk. This will result in the filesystem cache as a >>whole receiving more pressure than other caches that only write one >>page in response to pressure. This is unbalanced, leads to some >>caches having shorter average page lifetimes than others, and it is >>therefor suboptimal. Yes? >> > >If your ->writepage() writes pages to disk it just means >that reiserfs will be able to clean its pages faster than >the other filesystems. > the logical extreme of this is that no write caching should be done at all, only read caching? > > >This means the VM will not call reiserfs ->writepage() as >often as for the other filesystems, since more of the >pages it finds will already be clean and freeable. > >I guess the only way to unbalance the caches is by actually >freeing pages in ->writepage, but I don't see any real reason >why you'd want to do that... > >regards, > >Rik > It would unbalance the write cache, not the read cache. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 1:01 ` Hans Reiser @ 2002-01-21 1:21 ` Rik van Riel 2002-01-21 1:26 ` Hans Reiser 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-21 1:21 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn Starr, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > Rik van Riel wrote: > >If your ->writepage() writes pages to disk it just means > >that reiserfs will be able to clean its pages faster than > >the other filesystems. > > the logical extreme of this is that no write caching should be done at > all, only read caching? You know that's bad for write clustering ;))) > >This means the VM will not call reiserfs ->writepage() as > >often as for the other filesystems, since more of the > >pages it finds will already be clean and freeable. > > > >I guess the only way to unbalance the caches is by actually > >freeing pages in ->writepage, but I don't see any real reason > >why you'd want to do that... > > It would unbalance the write cache, not the read cache. Many workloads tend to read pages again after they've written them, so throwing away pages immediately doesn't seem like a good idea. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 1:21 ` Rik van Riel @ 2002-01-21 1:26 ` Hans Reiser 2002-01-21 1:40 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Hans Reiser @ 2002-01-21 1:26 UTC (permalink / raw) To: Rik van Riel; +Cc: Shawn Starr, linux-kernel Rik van Riel wrote: >On Mon, 21 Jan 2002, Hans Reiser wrote: > >>Rik van Riel wrote: >> > >>>If your ->writepage() writes pages to disk it just means >>>that reiserfs will be able to clean its pages faster than >>>the other filesystems. >>> >>the logical extreme of this is that no write caching should be done at >>all, only read caching? >> > >You know that's bad for write clustering ;))) > >>>This means the VM will not call reiserfs ->writepage() as >>>often as for the other filesystems, since more of the >>>pages it finds will already be clean and freeable. >>> >>>I guess the only way to unbalance the caches is by actually >>>freeing pages in ->writepage, but I don't see any real reason >>>why you'd want to do that... >>> >>It would unbalance the write cache, not the read cache. >> > >Many workloads tend to read pages again after they've written >them, so throwing away pages immediately doesn't seem like a >good idea. > I think I must have said free when I meant clean, and this naturally confused you. writepage() cleans pages, which is sometimes necessary for freeing them, but it does not free them itself. The one place where we would free them is when we repack slums before writing them. In this case, an empty node is not going to get accessed again, so it should be freed. Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-21 1:26 ` Hans Reiser @ 2002-01-21 1:40 ` Rik van Riel 0 siblings, 0 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-21 1:40 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn Starr, linux-kernel On Mon, 21 Jan 2002, Hans Reiser wrote: > I think I must have said free when I meant clean, and this naturally > confused you. > > writepage() cleans pages, which is sometimes necessary for freeing them, > but it does not free them itself. > > The one place where we would free them is when we repack slums before > writing them. In this case, an empty node is not going to get accessed > again, so it should be freed. Agreed. Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 9:04 Possible Idea with filesystem buffering Shawn 2002-01-20 11:31 ` Hans Reiser @ 2002-01-20 15:49 ` Anton Altaparmakov 2002-01-20 21:21 ` Hans Reiser 1 sibling, 1 reply; 92+ messages in thread From: Anton Altaparmakov @ 2002-01-20 15:49 UTC (permalink / raw) To: Hans Reiser; +Cc: Shawn, linux-kernel At 11:31 20/01/02, Hans Reiser wrote: >In version 4 of reiserfs, our plan is to implement writepage such that it >does not write the page but instead pressures the reiser4 cache and marks >the page as recently accessed. This is Linus's preferred method of doing that. But why do you want to do your own cache? Any individual fs driver is in no position to know the overall demands on the VMM of the currently running kernel/user programs/etc. As such it is IMHO inefficient and I think it won't actually work due to VMM requiring to free specific memory and hence calling writepage on that specific memory so it can throw the pages away afterwards but in your concept writepage won't result in the page being marked clean and the vm has made no progress and you have just created a hole load of headaches for the VMM which it can't solve... The VMM should be the ONLY thing in the kernel that has full control of all caches in the system, and certainly all fs caches. Why you are putting a second cache layer underneath the VMM is beyond me. It would be much better to fix/expand the capabilities of the existing VMM which would have the benefit that all fs could benefit not just ReiserFS. >Personally, I think that makes writepage the wrong name for that function, >but I must admit it gets the job done, and it leaves writepage as the >right name for all filesystems that don't manage their own cache, which is >most of them. Yes it does make it the wrong name, but not only that it also breaks the existing VMM if I understand anything about the VMM (which may of course not be the case...). Just a thought. Best regards, Anton >Hans > >Shawn wrote: > >>I've noticed that XFS's filesystem has a separate pagebuf_daemon to handle >>caching/buffering. >> >>Why not make a kernel page/caching daemon for other filesystems to use >>(kpagebufd) so that each filesystem can use a kernel daemon interface to >>handle buffering and caching. >> >>I found that XFS's buffering/caching significantly reduced I/O load on the >>system (with riel's rmap11b + rml's preempt patches and Andre's IDE >>patch). >> >>But I've not been able to acheive the same speed results with ReiserFS :-( >> >>Just as we have a filesystem (VFS) layer, why not have a buffering/caching >>layer for the filesystems to use inconjunction with the VM? >There is hostility to this from one of the VM maintainers. He is >concerned that separate caches were what they had before and they behaved >badly. I think that they simply coded them wrong the time before. The >time before, the pressure on the subcaches was uneven, with some caches >only getting pressure if the other caches couldn't free anything, so of >course it behaved badly. > >> >> >>Comments, suggestions, flames welcome ;) >> >>Shawn. >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >>Please read the FAQ at http://www.tux.org/lkml/ >> > > > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/ ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-20 15:49 ` Anton Altaparmakov @ 2002-01-20 21:21 ` Hans Reiser 0 siblings, 0 replies; 92+ messages in thread From: Hans Reiser @ 2002-01-20 21:21 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Shawn, linux-kernel Anton Altaparmakov wrote: > At 11:31 20/01/02, Hans Reiser wrote: > >> In version 4 of reiserfs, our plan is to implement writepage such >> that it does not write the page but instead pressures the reiser4 >> cache and marks the page as recently accessed. This is Linus's >> preferred method of doing that. > > > But why do you want to do your own cache? Any individual fs driver is > in no position to know the overall demands on the VMM of the currently > running kernel/user programs/etc. So the VM system should inform it. The way to do that is to convey a sense of cache pressure that is in proportion to the size of cache used by that cache submanager, and then the cache submanager has to react proportionally. If every write page is consider a pressure increment, and if the page is marked accessed, then proportional pressure is achieved. > As such it is IMHO inefficient and I think it won't actually work due > to VMM requiring to free specific memory and hence calling writepage > on that specific memory so it can throw the pages away afterwards but > in your concept writepage won't result in the page being marked clean > and the vm has made no progress and you have just created a hole load > of headaches for the VMM which it can't solve... > > The VMM should be the ONLY thing in the kernel that has full control > of all caches in the system, and certainly all fs caches. Why you are > putting a second cache layer underneath the VMM is beyond me. It would > be much better to fix/expand the capabilities of the existing VMM > which would have the benefit that all fs could benefit not just ReiserFS. > I agree, except that using writepage is what Linus wants, and except for the DMA bug Rik mentions, it should work. It would be nice if the VM maintainers were to comment writepage so that other filesystems could know how to use it (and fix the DMA bug Rik mentions). Hans ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. @ 2002-01-22 21:02 Rolf Lear 0 siblings, 0 replies; 92+ messages in thread From: Rolf Lear @ 2002-01-22 21:02 UTC (permalink / raw) To: linux-kernel; +Cc: Rik van Riel, reiser Excuse me for being a kernel newbie (and a list lurker), and for simplifying what is obviously a complex issue... but ... the underlying issue really is simple. Further, you are both suggesting that a change in design is not out-of-the-question. The VM is responsible for making sure that Mem is used efficiently. The FS is responsible for making sure that the disks (both space and speed) are used efficiently. Now, I have followed this thread for days, and I agree with Rik that the VM should be able to tell (command) the FS to free a page. I agree with Hans that "Ideally", the VM should be capable of identifying the best page to free (in terms of cost to the FS). In this ideal world, it is the responsibility of an intelligent FS to inform an intelligent VM what it can do quickly, and what will take time. What I propose is either: a) An indication on each dirty page the cost required to clean it. b) A FS function which can be called which indicates the cost of a clean. This cost should be measured in terms of something relevant like approximate IO time. FS's which do not support this system should have stubs which cost all pages equally. The system would work as follows: VM Needs to free some Mem, and not enough clean pages can be freed. VM Identifies those dirty pages which are cheap to flush/clean, and does it. If VM Needs to flush an expensive page, it can still do it, but it knows whe price ahead of time (double bonus). To identify the cheap pages, the VM can ask the FS the price, and as an added bonus, the FS can tell the VM how many other pages will get freed in the process. In my world of client-server / databases / etc, this just makes sense. If this intelligent VM has a basic FS, it looses nothing. If it has an intelligent FS, it has more information to make better decisions. Rolf ^ permalink raw reply [flat|nested] 92+ messages in thread
[parent not found: <Pine.LNX.4.33L.0201222008280.32617-100000@imladris.surriel.com>]
* Re: Possible Idea with filesystem buffering. [not found] <Pine.LNX.4.33L.0201222008280.32617-100000@imladris.surriel.com> @ 2002-01-22 23:31 ` Shawn Starr 2002-01-22 23:37 ` Rik van Riel 0 siblings, 1 reply; 92+ messages in thread From: Shawn Starr @ 2002-01-22 23:31 UTC (permalink / raw) To: Rik van Riel; +Cc: Linux The only functionality added to the kernel would be a a interface for filesystems to share it would basically create kpagebuf_* functions. Shawn. On Tue, 2002-01-22 at 17:08, Rik van Riel wrote: > On 22 Jan 2002, Shawn Starr wrote: > > > the pagebuf daemon would use try_to_free_pages() periodically in its > > queue. > > So it wouldn't add any functionality to the kernel ? > > Rik > -- > "Linux holds advantages over the single-vendor commercial OS" > -- Microsoft's "Competing with Linux" document > > http://www.surriel.com/ http://distro.conectiva.com/ > > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 23:31 ` Shawn Starr @ 2002-01-22 23:37 ` Rik van Riel 2002-01-23 5:26 ` Shawn Starr 0 siblings, 1 reply; 92+ messages in thread From: Rik van Riel @ 2002-01-22 23:37 UTC (permalink / raw) To: Shawn Starr; +Cc: Linux On 22 Jan 2002, Shawn Starr wrote: > The only functionality added to the kernel would be a a interface for > filesystems to share it would basically create kpagebuf_* functions. What would these things achieve ? It would be nice if you could give us a quick explanation of what exactly kpagebufd is supposed to do, if only so I can keep that in mind while working on the VM ;) Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-22 23:37 ` Rik van Riel @ 2002-01-23 5:26 ` Shawn Starr 0 siblings, 0 replies; 92+ messages in thread From: Shawn Starr @ 2002-01-23 5:26 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel The VM is busy with other tasks so why not have a daemon handle pages delegated from the VM? Having a pagebuf daemon would allow for delay writes and allow for perhaps readahead buffering of data having theses would take some pressure off of the VM no? On Tue, 2002-01-22 at 18:37, Rik van Riel wrote: > On 22 Jan 2002, Shawn Starr wrote: > > > The only functionality added to the kernel would be a a interface for > > filesystems to share it would basically create kpagebuf_* functions. > > What would these things achieve ? > > It would be nice if you could give us a quick explanation of > what exactly kpagebufd is supposed to do, if only so I can > keep that in mind while working on the VM ;) > > Rik > -- > "Linux holds advantages over the single-vendor commercial OS" > -- Microsoft's "Competing with Linux" document > > http://www.surriel.com/ http://distro.conectiva.com/ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. @ 2002-01-23 9:43 Martin Knoblauch 2002-01-23 11:52 ` Helge Hafting 0 siblings, 1 reply; 92+ messages in thread From: Martin Knoblauch @ 2002-01-23 9:43 UTC (permalink / raw) To: linux-kernel@vger.kernel.org > Re: Possible Idea with filesystem buffering. > > From: Richard B. Johnson (root@chaos.analogic.com) > Date: Tue Jan 22 2002 - 17:10:27 EST > > > We need a free-RAM target, possibly based upon a percentage of > available RAM. The lack of such a target is what causes the > out-of-RAM condition we have been experiencing. Somebody thought > that "free RAM is wasted RAM" and the VM has been based upon > that theory. That theory has been proven incorrect. You need Now, I think the theory itself is OK. The problem is that the stuff in buffer/caches is to sticky. It does not go away when "more important" uses for memory come up. Or at least it does not go away fast enough. > free RAM, just like you need "excess horsepower" to make > automobiles drivable. That free RAM is the needed "rubber-band" > to absorb the dynamics of real-world systems. > Correct. The free target would help to avoid the panic/frenzy that breaks out when we run out of free memory. Question: what about just setting a maximum limit to the cache/buffer size. Either absolute, or as a fraction of total available memory? Sure, it maybe a waste of memory in most situations, but sometimes the administrator/user of a system simply "knows better" than the FVM (F == Fine ? :-) While being there, one could also add a "guaranteed minimum" limit for the cache/buffer size. This way preventing a complete meltdown of IO performance. True64 has such limits. They are usually at 100% (max) and I think 20% (min), giving the cache access to all memory. But there were situations where a max of 10% was just the rigth thing to do. I know, the tuning-knob approach is frowned upon. But sometimes there are workloads where even the best VM may not know how to react correctly. Martin -- ------------------------------------------------------------------ Martin Knoblauch | email: Martin.Knoblauch@TeraPort.de TeraPort GmbH | Phone: +49-89-510857-309 C+ITS | Fax: +49-89-510857-111 http://www.teraport.de | Mobile: +49-170-4904759 ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-23 9:43 Martin Knoblauch @ 2002-01-23 11:52 ` Helge Hafting 2002-01-23 12:02 ` Rik van Riel 2002-01-23 12:11 ` Martin Knoblauch 0 siblings, 2 replies; 92+ messages in thread From: Helge Hafting @ 2002-01-23 11:52 UTC (permalink / raw) To: m.knoblauch, linux-kernel Martin Knoblauch wrote: > > > Re: Possible Idea with filesystem buffering. > > > > From: Richard B. Johnson (root@chaos.analogic.com) > > Date: Tue Jan 22 2002 - 17:10:27 EST > > > > > > We need a free-RAM target, possibly based upon a percentage of > > available RAM. The lack of such a target is what causes the > > out-of-RAM condition we have been experiencing. Somebody thought > > that "free RAM is wasted RAM" and the VM has been based upon > > that theory. That theory has been proven incorrect. You need > As far as I know, there is a free target. The kernel will try to get rid of old pages (swapout program memory, toss cache pages) when there's too little free memory around. This keeps memory around so future allocations and IO request may start immediately. Maybe the current target is too small, but it is there. Without it, _every_ allocation or file operation would block waiting for a swapout/cache flush in order to get free pages. Linux isn't nearly _that_ bad. > Now, I think the theory itself is OK. The problem is that the stuff in > buffer/caches is to sticky. It does not go away when "more important" > uses for memory come up. Or at least it does not go away fast enough. > Then we need a larger free target to cope with the slow cache freeing. > > free RAM, just like you need "excess horsepower" to make > > automobiles drivable. That free RAM is the needed "rubber-band" > > to absorb the dynamics of real-world systems. > > Question: what about just setting a maximum limit to the cache/buffer > size. Either absolute, or as a fraction of total available memory? Sure, > it maybe a waste of memory in most situations, but sometimes the > administrator/user of a system simply "knows better" than the FVM (F == > Fine ? :-) [...] > I know, the tuning-knob approach is frowned upon. But sometimes there > are workloads where even the best VM may not know how to react > correctly. Wasting memory "in most situations" isn't really an option. But I see nothing wrong with "knobs" as long as they are automatic by default. Those who want to optimize for a corner case can go and turn off the autopilot. Helge Hafting ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-23 11:52 ` Helge Hafting @ 2002-01-23 12:02 ` Rik van Riel 2002-01-23 12:11 ` Martin Knoblauch 1 sibling, 0 replies; 92+ messages in thread From: Rik van Riel @ 2002-01-23 12:02 UTC (permalink / raw) To: Helge Hafting; +Cc: m.knoblauch, linux-kernel On Wed, 23 Jan 2002, Helge Hafting wrote: [free memory is wasted memory] > > Now, I think the theory itself is OK. The problem is that the stuff in > > buffer/caches is to sticky. It does not go away when "more important" > > uses for memory come up. Or at least it does not go away fast enough. > > Then we need a larger free target to cope with the slow cache freeing. Or we make the cache freeing faster. ;) If you have the time, you might want to try -rmap some day and see about the cache freeing... regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: Possible Idea with filesystem buffering. 2002-01-23 11:52 ` Helge Hafting 2002-01-23 12:02 ` Rik van Riel @ 2002-01-23 12:11 ` Martin Knoblauch 1 sibling, 0 replies; 92+ messages in thread From: Martin Knoblauch @ 2002-01-23 12:11 UTC (permalink / raw) To: linux-kernel; +Cc: Helge Hafting Helge Hafting wrote: > > Martin Knoblauch wrote: > > > > > Re: Possible Idea with filesystem buffering. > > > > > > From: Richard B. Johnson (root@chaos.analogic.com) > > > Date: Tue Jan 22 2002 - 17:10:27 EST > > > > > > > > > We need a free-RAM target, possibly based upon a percentage of > > > available RAM. The lack of such a target is what causes the > > > out-of-RAM condition we have been experiencing. Somebody thought > > > that "free RAM is wasted RAM" and the VM has been based upon > > > that theory. That theory has been proven incorrect. You need > > > As far as I know, there is a free target. The kernel will try to get > rid of old pages (swapout program memory, toss cache pages) > when there's too little free memory around. This keeps memory > around so future allocations and IO request may start > immediately. Maybe the current target is too small, but it is there. > Without it, _every_ allocation or file operation would block > waiting for a swapout/cache flush in order to get free pages. Linux > isn't nearly _that_ bad. > Nobody said it is _that_ bad. There are just some [maybe rare] situations where it falls over and does not recover gracefully. > > Now, I think the theory itself is OK. The problem is that the stuff in > > buffer/caches is to sticky. It does not go away when "more important" > > uses for memory come up. Or at least it does not go away fast enough. > > > Then we need a larger free target to cope with the slow cache freeing. > And as Rik said, we need to make freeing cache faster. All of this will help the 98+% cases that the VM can be optimized for. But I doubt that you can make it 100% and keep it simple at the same time. > > > free RAM, just like you need "excess horsepower" to make > > > automobiles drivable. That free RAM is the needed "rubber-band" > > > to absorb the dynamics of real-world systems. > > > > Question: what about just setting a maximum limit to the cache/buffer > > size. Either absolute, or as a fraction of total available memory? Sure, > > it maybe a waste of memory in most situations, but sometimes the > > administrator/user of a system simply "knows better" than the FVM (F == > > Fine ? :-) > [...] > > I know, the tuning-knob approach is frowned upon. But sometimes there > > are workloads where even the best VM may not know how to react > > correctly. > > Wasting memory "in most situations" isn't really an option. But I > see nothing wrong with "knobs" as long as they are automatic by > default. Those who want to optimize for a corner case can > go and turn off the autopilot. > Definitely. The defaults need to be set for the general case. Martin -- ------------------------------------------------------------------ Martin Knoblauch | email: Martin.Knoblauch@TeraPort.de TeraPort GmbH | Phone: +49-89-510857-309 C+ITS | Fax: +49-89-510857-111 http://www.teraport.de | Mobile: +49-170-4904759 ^ permalink raw reply [flat|nested] 92+ messages in thread
[parent not found: <Pine.LNX.4.33.0201231301560.24338-100000@coffee.psychology.mcmaster.ca>]
[parent not found: <3C4FC478.BCC44CDF@TeraPort.de>]
[parent not found: <3C4FDB80.C9F83EBB@aitel.hist.no>]
* Re: Possible Idea with filesystem buffering. [not found] ` <3C4FDB80.C9F83EBB@aitel.hist.no> @ 2002-01-24 13:59 ` Martin Knoblauch 0 siblings, 0 replies; 92+ messages in thread From: Martin Knoblauch @ 2002-01-24 13:59 UTC (permalink / raw) To: Helge Hafting; +Cc: linux-kernel@vger.kernel.org Helge Hafting wrote: > > Martin Knoblauch wrote: > > > you are correct in stating that it is [still] true that free memory is > > wasted memory. The problem is that the "pool of trivially-freeable > > pages" is under certain circumstances apprently not trivially-freeable > > enough. And the pool has the tendency to push out processes into swap. > > OK, most times these processes have been incative for quite some time, > > but - and this is my opinion based on quite a few years in this field - > > it should never do this. Task memory is "more valuable" than > ^^^^^ > > buffer/cache memory. At least I want (demand :-) a switch to make the VM > > behave that way. > > More valuable perhaps, but infinitely more valuable? It depends on the situation :-) Thats why I want to be able to tell the VM that it should leave its greedy fingers from task memory. > Do you want swapping to happen _only_ if the process memory alone > overflows available memory? Note that you'll get really unuseable > fs performance if the page cache _never_ may push anything into > swap. That means you have _no_ cache left as soon as process > memory fills RAM completely. No cache at all. > There are bordercases where I may life better with only very few cache pages left. Most likely not on a web server or similar. But there are applications in the HPTC field, where FS caching is useless, unless you can pack everything in cache. In a former life I benchmarked an out-of-core FEM solver on Alpha/Tru64. Sure, when we could stick the whole scratch dataset in cache the performance was awesome. Unfortunatelly the dataset was about 40 GB and we only had 16 GB available (several constraints, one of them the price the customer was willing to pay for :-(. The [very very well tuned] IO pattern on the scratch dataset resulted in optimal performance when the cache was turned off. The optimal system would have been the next-higher-class box with 48 GB of memory and a 40 GB ramdisk. Of course, we could't propose that :-(( That is why I want to be able to set maximum *and* minimum cache size. The maximum setting helps me tuning application performance (setting it to 100% just means the current behaviour) and setting mimimum guarantees at least some minimal FS performance. > The balance between cache and other memory may need tweaking, > but don't bother going too far. > As I said, it depends on the situation. I am happy when 98+% of the systems can run happily with the defaults. But for the last 2% the tuning-knobs come handy. And sure - use on your own risk. All warranties void. > > And yes, quite a few smart people are working on it. But the progress > > in the 2.4.x series is pretty slow and the direction still seems to be > > unclear. > > There are both aa patches and Rik's rmap patch. Hard to say who "wins", > but you can influence the choice by testing one or both and post > your findings. > Some of us try. Up to the point where we become annoying :-) Personally I do not think that one of them needs to win. There is very useful/succesful stuff in both of them - not to forget that -aa is much more than just VM work. I see it this way: -aa is an approach to fix the obvious bugs and make the current system behave great/better/acceptable. rmap is more on the infrastructure side - enabling new stuff to be done. Similar things could be said about preempt vs. ll in some other much-too-long thread. Martin PS: Putting lkml back -- ------------------------------------------------------------------ Martin Knoblauch | email: Martin.Knoblauch@TeraPort.de TeraPort GmbH | Phone: +49-89-510857-309 C+ITS | Fax: +49-89-510857-111 http://www.teraport.de | Mobile: +49-170-4904759 ^ permalink raw reply [flat|nested] 92+ messages in thread
end of thread, other threads:[~2002-01-24 14:00 UTC | newest]
Thread overview: 92+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-20 9:04 Possible Idea with filesystem buffering Shawn
2002-01-20 11:31 ` Hans Reiser
2002-01-20 13:56 ` Rik van Riel
2002-01-20 14:21 ` Hans Reiser
2002-01-20 15:13 ` Rik van Riel
2002-01-20 21:15 ` Hans Reiser
2002-01-20 21:24 ` Rik van Riel
2002-01-20 21:30 ` Hans Reiser
2002-01-20 21:40 ` Rik van Riel
2002-01-20 21:49 ` Hans Reiser
2002-01-20 22:00 ` Rik van Riel
2002-01-21 0:10 ` Matt
2002-01-21 0:57 ` Hans Reiser
2002-01-21 1:28 ` Anton Altaparmakov
2002-01-21 2:29 ` Shawn Starr
2002-01-21 19:15 ` Shawn Starr
2002-01-22 22:02 ` Hans Reiser
2002-01-21 9:21 ` Horst von Brand
2002-01-21 9:13 ` Horst von Brand
2002-01-21 15:29 ` Eric W. Biederman
2002-01-20 17:51 ` Mark Hahn
2002-01-20 21:24 ` Hans Reiser
2002-01-20 21:32 ` Rik van Riel
2002-01-21 15:37 ` Eric W. Biederman
2002-01-20 22:45 ` Shawn Starr
2002-01-20 23:11 ` Rik van Riel
2002-01-20 23:40 ` Shawn Starr
2002-01-20 23:48 ` Rik van Riel
2002-01-21 0:44 ` Hans Reiser
2002-01-21 0:52 ` Rik van Riel
2002-01-21 1:08 ` Hans Reiser
2002-01-21 1:39 ` Rik van Riel
2002-01-21 11:10 ` Hans Reiser
2002-01-21 12:12 ` Rik van Riel
2002-01-21 13:42 ` Hans Reiser
2002-01-21 13:54 ` Rik van Riel
2002-01-21 14:07 ` Hans Reiser
2002-01-21 17:21 ` Chris Mason
2002-01-21 17:47 ` Hans Reiser
2002-01-21 19:44 ` Chris Mason
2002-01-21 20:41 ` Hans Reiser
2002-01-21 21:53 ` Chris Mason
2002-01-22 6:02 ` Andreas Dilger
2002-01-22 10:09 ` Tommi Kyntola
2002-01-22 11:39 ` Hans Reiser
2002-01-22 18:41 ` Andrew Morton
2002-01-22 19:03 ` Rik van Riel
2002-01-23 20:35 ` [Ext2-devel] " Stephen C. Tweedie
2002-01-23 20:48 ` Hans Reiser
2002-01-23 20:55 ` Andrew Morton
2002-01-23 23:53 ` Hugh Dickins
2002-01-24 0:01 ` Jeff Garzik
2002-01-22 20:19 ` Hans Reiser
2002-01-22 20:50 ` Rik van Riel
2002-01-22 14:03 ` Chris Mason
2002-01-22 14:39 ` Rik van Riel
2002-01-22 18:46 ` Hans Reiser
2002-01-22 19:19 ` Chris Mason
2002-01-22 20:13 ` Steve Lord
2002-01-22 21:22 ` Chris Mason
2002-01-22 20:32 ` Hans Reiser
2002-01-22 21:08 ` Chris Mason
2002-01-22 22:05 ` Hans Reiser
2002-01-22 22:21 ` Rik van Riel
2002-01-23 0:16 ` Hans Reiser
2002-01-22 22:10 ` Richard B. Johnson
2002-01-23 1:14 ` Stuart Young
2002-01-23 17:16 ` Daniel Phillips
2002-01-22 21:12 ` Rik van Riel
2002-01-22 21:28 ` Shawn Starr
2002-01-22 21:31 ` Rik van Riel
2002-01-22 20:20 ` Rik van Riel
2002-01-22 22:31 ` Hans Reiser
2002-01-22 23:34 ` Rik van Riel
2002-01-23 17:15 ` Josh MacDonald
2002-01-21 0:28 ` Hans Reiser
2002-01-21 0:47 ` Rik van Riel
2002-01-21 1:01 ` Hans Reiser
2002-01-21 1:21 ` Rik van Riel
2002-01-21 1:26 ` Hans Reiser
2002-01-21 1:40 ` Rik van Riel
2002-01-20 15:49 ` Anton Altaparmakov
2002-01-20 21:21 ` Hans Reiser
-- strict thread matches above, loose matches on Subject: below --
2002-01-22 21:02 Rolf Lear
[not found] <Pine.LNX.4.33L.0201222008280.32617-100000@imladris.surriel.com>
2002-01-22 23:31 ` Shawn Starr
2002-01-22 23:37 ` Rik van Riel
2002-01-23 5:26 ` Shawn Starr
2002-01-23 9:43 Martin Knoblauch
2002-01-23 11:52 ` Helge Hafting
2002-01-23 12:02 ` Rik van Riel
2002-01-23 12:11 ` Martin Knoblauch
[not found] <Pine.LNX.4.33.0201231301560.24338-100000@coffee.psychology.mcmaster.ca>
[not found] ` <3C4FC478.BCC44CDF@TeraPort.de>
[not found] ` <3C4FDB80.C9F83EBB@aitel.hist.no>
2002-01-24 13:59 ` Martin Knoblauch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox