* Idea about a disc backed ram filesystem @ 2006-06-08 20:33 ` Sascha Nitsch 2006-06-08 20:43 ` Lennart Sorensen ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Sascha Nitsch @ 2006-06-08 20:33 UTC (permalink / raw) To: Linux Kernel Mailing List Hi, this is (as of this writing) just an idea. === current state === Currently we have ram filesystems (like tmpfs) and disc based file systems (ext2/3, xfs, <insert your fav. fs>). tmpfs is extremely fast but suffers from data losses from restarts, crashes and power outages. Disc access is slow against a ram based fs. === the idea === My idea is to mix them to the following hybrid: - mount the new fs over an existing dir as an overlay - all files overlayed are still accessible - after the first read, the file stays in memory (like a file cache) - all writes are flushed out to the underlying fs (maybe done async) - all reads are always done from the memory cache unless they are not cached yet - the cache stays until the partition is unmounted - the maximum size of the overlayed filesystem could be physical ram/2 (like tmpfs) === advantages === once the files are read, no more "slow" disc reading is needed=> huge read speed improvements (like on tmpfs) if the writing is done asyncronous, write speeds would be as fast as a tmpfs => huge write speedup if done syncronous, write speed almost as fast as native disc fs the ram fs would be imune against data loss from reboots or controled shutdown if syncronous write is done, the fs would be imune to crashes/power outages (with the usual exceptions like on disc fs) === disadvantage === possible higher memory usage (see implementation ideas below) === usages === possible usage scenarios could be any storage where a smaller set of files get read/written a lot, like databases definition of smaller: lets say up to 50% of physical ram size. Depending on architecture and money spent, this can be a lot :) === implementation ideas === One note first: I don't know the fs internals of the kernel (yet), so these ideas might not work, but you should get the idea. One idea is to build a complete virtual filesystem that connects to the VFS layer and hands the writes through to the "original" fs driver. The caching would be done in that layer. This might cause double caching (in the io cache) and might waste memory. But this idea would enable the possibility of async writes (when the disc has less to do) and gives write speed improves. The other idea would be to modify the existing filesystem cache algorithm to have a flag "always keep this file in memory". The second one may be easier to do and may cause less side effects, but might not enable async writes. Since this overlay is done in the kernel, no other process could change the files under the overlay. Remote FS must be excluded from the cache layer (for obvious reasons). Any kind of feedback is welcome. If this has been discussed earlier, sorry for double posting. I haven't found anything like this in the archives. Just point me in the right direction. Regards, Sascha Nitsch ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 20:33 ` Idea about a disc backed ram filesystem Sascha Nitsch @ 2006-06-08 20:43 ` Lennart Sorensen 2006-06-08 21:12 ` Sash 2006-06-08 21:51 ` Horst von Brand ` (2 subsequent siblings) 3 siblings, 1 reply; 22+ messages in thread From: Lennart Sorensen @ 2006-06-08 20:43 UTC (permalink / raw) To: Sascha Nitsch; +Cc: Linux Kernel Mailing List On Thu, Jun 08, 2006 at 10:33:13PM +0200, Sascha Nitsch wrote: > this is (as of this writing) just an idea. > > === current state === > Currently we have ram filesystems (like tmpfs) and disc based file systems > (ext2/3, xfs, <insert your fav. fs>). > > tmpfs is extremely fast but suffers from data losses from restarts, crashes > and power outages. Disc access is slow against a ram based fs. > > === the idea === > My idea is to mix them to the following hybrid: > - mount the new fs over an existing dir as an overlay > - all files overlayed are still accessible > - after the first read, the file stays in memory (like a file cache) > - all writes are flushed out to the underlying fs (maybe done async) > - all reads are always done from the memory cache unless they are not cached > yet > - the cache stays until the partition is unmounted > - the maximum size of the overlayed filesystem could be physical ram/2 (like tmpfs) > > === advantages === > once the files are read, no more "slow" disc reading is needed=> huge read > speed improvements (like on tmpfs) > if the writing is done asyncronous, write speeds would be as fast as a > tmpfs => huge write speedup > if done syncronous, write speed almost as fast as native disc fs > the ram fs would be imune against data loss from reboots or controled shutdown > if syncronous write is done, the fs would be imune to crashes/power > outages (with the usual exceptions like on disc fs) > > === disadvantage === > possible higher memory usage (see implementation ideas below) > > === usages === > possible usage scenarios could be any storage where a > smaller set of files get read/written a lot, like databases > definition of smaller: lets say up to 50% of physical ram size. > Depending on architecture and money spent, this can be a lot :) > > === implementation ideas === > One note first: > I don't know the fs internals of the kernel (yet), so these ideas might not > work, but you should get the idea. > > One idea is to build a complete virtual filesystem that connects to the VFS > layer and hands the writes through to the "original" fs driver. > The caching would be done in that layer. This might cause double caching > (in the io cache) and might waste memory. > But this idea would enable the possibility of async writes (when the disc has > less to do) and gives write speed improves. > > The other idea would be to modify the existing filesystem cache algorithm to > have a flag "always keep this file in memory". > > The second one may be easier to do and may cause less side effects, but > might not enable async writes. > > Since this overlay is done in the kernel, no other process could change the > files under the overlay. > Remote FS must be excluded from the cache layer (for obvious reasons). > > Any kind of feedback is welcome. > > If this has been discussed earlier, sorry for double posting. I haven't found > anything like this in the archives. Just point me in the right direction. I am a bit puzzled. How is your idea different in use than the current caching system that the kernel already applies to reads of all block devices, other than essentially locking the cached data into ram, rather than letting it get kicked out if it isn't used. Writing is similarly cached unless the application asks for it to not be cached. It is flushed out within a certain amount of time, or when there is an idle period. I fail to see where having to explicitly specify something as having to be cached in ram and locked in is an improvement over simply caching anything that is used a lot from any disk. Your idea also appears to break any application that asks for sync since you take over control of when things are flushed to disk. I just don't get it. :) Len Sorensen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 20:43 ` Lennart Sorensen @ 2006-06-08 21:12 ` Sash 2006-06-09 10:45 ` Jan Engelhardt 0 siblings, 1 reply; 22+ messages in thread From: Sash @ 2006-06-08 21:12 UTC (permalink / raw) To: Linux Kernel Mailing List Am Thursday, 8. June 2006 22:43 schrieben Sie: > On Thu, Jun 08, 2006 at 10:33:13PM +0200, Sascha Nitsch wrote: > > .... > > .... > > I am a bit puzzled. How is your idea different in use than the current > caching system that the kernel already applies to reads of all block > devices, other than essentially locking the cached data into ram, rather > than letting it get kicked out if it isn't used. Writing is similarly > cached unless the application asks for it to not be cached. It is > flushed out within a certain amount of time, or when there is an idle > period. I fail to see where having to explicitly specify something as > having to be cached in ram and locked in is an improvement over simply > caching anything that is used a lot from any disk. Your idea also > appears to break any application that asks for sync since you take over > control of when things are flushed to disk. > > I just don't get it. :) > > Len Sorensen > True, my idea is indeed similar to the existing cache, thats why I had one of the ideas for the implementation. If you ever had the possibility to run a database application on a tmpfs you got to "experience" the difference :) The idea was simply born to have a fast tmpfs but with the safety of permanent data storage in case of reboots/crashes without user level app modification. The problem with the current cache implementation is that I have not much control about what keeps cached and what not. (which is fine for normal usage). On a normal server with mixed load my database caches are flushed and used for other stuff (like mail or webserver cache). If I access the database files again, they have to be reloaded from disc which slows it down. Same applies to other applications as well, this is just an example from my daily work. (~1GB database on a 2GB ram box) and a lot of disc io because of cache misses with a read/write ratio of ~20:1). Putting that DB into RAM is dangerous because of the data loss risk. The idea enables me to have a defined set of files/dirs permanently cached, and take the choice away from the kernel (for a fixed amount of memory and files). You are right, the idea in the current form may break application that ask for sync. Maybe this can be honored by the implementation to access that files directly. If someone has a better idea to get the desired effect, feel free to post them here. One of the reasons I posted the idea here is to have some useful comments from people with far more kernel/fs knowledge than I have. I hope I could clear the clouds a bit. Sascha Nitsch ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 21:12 ` Sash @ 2006-06-09 10:45 ` Jan Engelhardt 0 siblings, 0 replies; 22+ messages in thread From: Jan Engelhardt @ 2006-06-09 10:45 UTC (permalink / raw) To: Sash; +Cc: Linux Kernel Mailing List >> I am a bit puzzled. How is your idea different in use than the current >> caching system that the kernel already applies to reads of all block >>... > >The idea was simply born to have a fast tmpfs but with the safety of permanent >data storage in case of reboots/crashes without user level app modification. > When do you want to write to disk? Anytime? That would impact on the "fast" attribute, in which case you don't need ramfs. Not anytime? Potential loss of data. Hm. Jan Engelhardt -- ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 20:33 ` Idea about a disc backed ram filesystem Sascha Nitsch 2006-06-08 20:43 ` Lennart Sorensen @ 2006-06-08 21:51 ` Horst von Brand 2006-06-08 22:39 ` Joshua Hudson 2006-06-08 22:48 ` Matheus Izvekov 2006-06-09 6:33 ` Andi Kleen 3 siblings, 1 reply; 22+ messages in thread From: Horst von Brand @ 2006-06-08 21:51 UTC (permalink / raw) To: Sascha Nitsch; +Cc: Linux Kernel Mailing List Sascha Nitsch <Sash_lkl@linuxhowtos.org> wrote: > this is (as of this writing) just an idea. > > === current state === > Currently we have ram filesystems (like tmpfs) and disc based file systems > (ext2/3, xfs, <insert your fav. fs>). Right. > tmpfs is extremely fast but suffers from data losses from restarts, crashes > and power outages. Part of the design tradeoffs. > Disc access is slow against a ram based fs. On-disk filesystems (and block device handling) are designed around that fact. > === the idea === > My idea is to mix them to the following hybrid: > - mount the new fs over an existing dir as an overlay > - all files overlayed are still accessible > - after the first read, the file stays in memory (like a file cache) > - all writes are flushed out to the underlying fs (maybe done async) > - all reads are always done from the memory cache unless they are not cached > yet > - the cache stays until the partition is unmounted > - the maximum size of the overlayed filesystem could be physical ram/2 (like tmpfs) But the current on.disk filesystems use caching of data in RAM extensively, /without/ having to keep the whole file in memory, just the pieces currently in active use. Your proposal negates the RAM for caches, so it would be much /slower/ than the current on-disk filesystems. BTW, many of the live-CD distributions do exactly this (RAM overlay over a CD-based filesystem). -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 21:51 ` Horst von Brand @ 2006-06-08 22:39 ` Joshua Hudson 0 siblings, 0 replies; 22+ messages in thread From: Joshua Hudson @ 2006-06-08 22:39 UTC (permalink / raw) To: linux-kernel This just *screams* block-layer. If anybody feels up to it, try making a modified loopback device that implements an independent, fixed-size write-through cache using vmalloc and such. I have a hunch it won't really improve performance much. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 20:33 ` Idea about a disc backed ram filesystem Sascha Nitsch 2006-06-08 20:43 ` Lennart Sorensen 2006-06-08 21:51 ` Horst von Brand @ 2006-06-08 22:48 ` Matheus Izvekov 2006-06-08 23:40 ` Måns Rullgård 2006-06-09 2:17 ` Horst von Brand 2006-06-09 6:33 ` Andi Kleen 3 siblings, 2 replies; 22+ messages in thread From: Matheus Izvekov @ 2006-06-08 22:48 UTC (permalink / raw) To: Sascha Nitsch; +Cc: Linux Kernel Mailing List On 6/8/06, Sascha Nitsch <Sash_lkl@linuxhowtos.org> wrote: > Hi, > > this is (as of this writing) just an idea. > > === current state === > Currently we have ram filesystems (like tmpfs) and disc based file systems > (ext2/3, xfs, <insert your fav. fs>). > > tmpfs is extremely fast but suffers from data losses from restarts, crashes > and power outages. Disc access is slow against a ram based fs. > > === the idea === > My idea is to mix them to the following hybrid: > - mount the new fs over an existing dir as an overlay > - all files overlayed are still accessible > - after the first read, the file stays in memory (like a file cache) > - all writes are flushed out to the underlying fs (maybe done async) > - all reads are always done from the memory cache unless they are not cached > yet > - the cache stays until the partition is unmounted > - the maximum size of the overlayed filesystem could be physical ram/2 (like tmpfs) > > === advantages === > once the files are read, no more "slow" disc reading is needed=> huge read > speed improvements (like on tmpfs) > if the writing is done asyncronous, write speeds would be as fast as a > tmpfs => huge write speedup > if done syncronous, write speed almost as fast as native disc fs > the ram fs would be imune against data loss from reboots or controled shutdown > if syncronous write is done, the fs would be imune to crashes/power > outages (with the usual exceptions like on disc fs) > > === disadvantage === > possible higher memory usage (see implementation ideas below) > > === usages === > possible usage scenarios could be any storage where a > smaller set of files get read/written a lot, like databases > definition of smaller: lets say up to 50% of physical ram size. > Depending on architecture and money spent, this can be a lot :) > > === implementation ideas === > One note first: > I don't know the fs internals of the kernel (yet), so these ideas might not > work, but you should get the idea. > > One idea is to build a complete virtual filesystem that connects to the VFS > layer and hands the writes through to the "original" fs driver. > The caching would be done in that layer. This might cause double caching > (in the io cache) and might waste memory. > But this idea would enable the possibility of async writes (when the disc has > less to do) and gives write speed improves. > > The other idea would be to modify the existing filesystem cache algorithm to > have a flag "always keep this file in memory". > > The second one may be easier to do and may cause less side effects, but > might not enable async writes. > > Since this overlay is done in the kernel, no other process could change the > files under the overlay. > Remote FS must be excluded from the cache layer (for obvious reasons). > > Any kind of feedback is welcome. > > If this has been discussed earlier, sorry for double posting. I haven't found > anything like this in the archives. Just point me in the right direction. > > Regards, > > Sascha Nitsch > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > I had a somewhat similar idea, once i have time to implement it ill submit a patch. My idea consisted of adding the capability to specify a device for tmpfs mounting. if you dont specify any device, tmpfs continues to behave the way it currently is. But if you do, once data doesnt fit on ram (or some other limit) anymore, it will flush things to this device. my intention was to reuse swap code for this, so you mount a tmpfs passing the dev node of some unused swap device, and it works just like tmpfs with a dedicated swap partition. So i hope it would be damn fast because of the simple disk format, and ofcourse all the data becomes lost if you umount it. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 22:48 ` Matheus Izvekov @ 2006-06-08 23:40 ` Måns Rullgård 2006-06-09 1:01 ` Matheus Izvekov 2006-06-09 2:17 ` Horst von Brand 1 sibling, 1 reply; 22+ messages in thread From: Måns Rullgård @ 2006-06-08 23:40 UTC (permalink / raw) To: linux-kernel "Matheus Izvekov" <mizvekov@gmail.com> writes: > My idea consisted of adding the capability to specify a device for > tmpfs mounting. if you dont specify any device, tmpfs continues to > behave the way it currently is. But if you do, once data doesnt fit on > ram (or some other limit) anymore, it will flush things to this > device. my intention was to reuse swap code for this, so you mount a > tmpfs passing the dev node of some unused swap device, and it works > just like tmpfs with a dedicated swap partition. I don't see what advantage this would have over normal tmpfs. -- Måns Rullgård mru@inprovide.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 23:40 ` Måns Rullgård @ 2006-06-09 1:01 ` Matheus Izvekov 2006-06-09 8:52 ` Måns Rullgård 0 siblings, 1 reply; 22+ messages in thread From: Matheus Izvekov @ 2006-06-09 1:01 UTC (permalink / raw) To: Måns Rullgård; +Cc: linux-kernel On 6/8/06, Måns Rullgård <mru@inprovide.com> wrote: > "Matheus Izvekov" <mizvekov@gmail.com> writes: > > > My idea consisted of adding the capability to specify a device for > > tmpfs mounting. if you dont specify any device, tmpfs continues to > > behave the way it currently is. But if you do, once data doesnt fit on > > ram (or some other limit) anymore, it will flush things to this > > device. my intention was to reuse swap code for this, so you mount a > > tmpfs passing the dev node of some unused swap device, and it works > > just like tmpfs with a dedicated swap partition. > > I don't see what advantage this would have over normal tmpfs. > > -- The difference is that the swap device is exclusive for the tmpfs mount. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 1:01 ` Matheus Izvekov @ 2006-06-09 8:52 ` Måns Rullgård 0 siblings, 0 replies; 22+ messages in thread From: Måns Rullgård @ 2006-06-09 8:52 UTC (permalink / raw) To: Matheus Izvekov; +Cc: Måns Rullgård, linux-kernel Matheus Izvekov said: > On 6/8/06, Måns Rullgård <mru@inprovide.com> wrote: >> "Matheus Izvekov" <mizvekov@gmail.com> writes: >> >> > My idea consisted of adding the capability to specify a device for >> > tmpfs mounting. if you dont specify any device, tmpfs continues to >> > behave the way it currently is. But if you do, once data doesnt fit on >> > ram (or some other limit) anymore, it will flush things to this >> > device. my intention was to reuse swap code for this, so you mount a >> > tmpfs passing the dev node of some unused swap device, and it works >> > just like tmpfs with a dedicated swap partition. >> >> I don't see what advantage this would have over normal tmpfs. > > The difference is that the swap device is exclusive for the tmpfs mount. Yes, and what would the advantage of that be? Sounds to me you'd only end up wasting swap space. -- Måns Rullgård mru@inprovide.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 22:48 ` Matheus Izvekov 2006-06-08 23:40 ` Måns Rullgård @ 2006-06-09 2:17 ` Horst von Brand 2006-06-09 4:59 ` Matheus Izvekov 1 sibling, 1 reply; 22+ messages in thread From: Horst von Brand @ 2006-06-09 2:17 UTC (permalink / raw) To: Matheus Izvekov; +Cc: Sascha Nitsch, Linux Kernel Mailing List Matheus Izvekov <mizvekov@gmail.com> wrote: [...] > I had a somewhat similar idea, once i have time to implement it ill > submit a patch. > My idea consisted of adding the capability to specify a device for > tmpfs mounting. if you dont specify any device, tmpfs continues to > behave the way it currently is. But if you do, once data doesnt fit on > ram (or some other limit) anymore, it will flush things to this > device. my intention was to reuse swap code for this, so you mount a > tmpfs passing the dev node of some unused swap device, and it works > just like tmpfs with a dedicated swap partition. tmpfs does use swap currently. Giving tmpfs a dedicated swap space is dumb, as it takes away the possibility of using that space for swapping when not in use by tmpfs (and viceversa). -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 2:17 ` Horst von Brand @ 2006-06-09 4:59 ` Matheus Izvekov 2006-06-09 13:43 ` Horst von Brand 0 siblings, 1 reply; 22+ messages in thread From: Matheus Izvekov @ 2006-06-09 4:59 UTC (permalink / raw) To: Horst von Brand; +Cc: Sascha Nitsch, Linux Kernel Mailing List On 6/8/06, Horst von Brand <vonbrand@inf.utfsm.cl> wrote: > tmpfs does use swap currently. Giving tmpfs a dedicated swap space is dumb, > as it takes away the possibility of using that space for swapping when not > in use by tmpfs (and viceversa). The idea is not dumb per se. Maybe you want your applications to swap to one device (or not swap at all) and a tmpfs mount to swap to another. For me at least it would make a difference. I dont use swap at all, have enough ram for all my processes. And ive seen that for some workloads, setting a temporary directory as tmpfs gives huge speed improvements. But just occasionally, the space used in this temp dir will not fit in my ram, so in this case swapping would be fine. The problem is, currently there is no way to enforce this. Ditto for the fact that, when you have many swap devices set, each with different performances, there is no way to give priorities/rules to enforce who uses each device. When someone gets to implement those features, this wouldnt be needed anymore. But that seems far away enough to justify a more immediate workaround. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 4:59 ` Matheus Izvekov @ 2006-06-09 13:43 ` Horst von Brand 2006-06-09 15:07 ` Matheus Izvekov 0 siblings, 1 reply; 22+ messages in thread From: Horst von Brand @ 2006-06-09 13:43 UTC (permalink / raw) To: Matheus Izvekov; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List Matheus Izvekov <mizvekov@gmail.com> wrote: > On 6/8/06, Horst von Brand <vonbrand@inf.utfsm.cl> wrote: > > tmpfs does use swap currently. Giving tmpfs a dedicated swap space is dumb, > > as it takes away the possibility of using that space for swapping when not > > in use by tmpfs (and viceversa). > The idea is not dumb per se. Maybe you want your applications to swap > to one device (or not swap at all) and a tmpfs mount to swap to > another. Why? If one device is faster, you'd want to prefer that one for swapping /and/ tmpfs. If not, I don't see the point. Except for limiting maximal sizes of tmpfs or swap, but limiting the later doesn't make much sense (why go OOM even though swap /is/ available?), and the former can be set on mount. > For me at least it would make a difference. How? > I dont use swap at all, have enough ram for all my processes. What is your beef then? > And ive > seen that for some workloads, setting a temporary directory as tmpfs > gives huge speed improvements. But just occasionally, the space used > in this temp dir will not fit in my ram, so in this case swapping > would be fine. The problem is, currently there is no way to enforce > this. That is exactly how tmpfs works, and has worked that way from the beginning. If it doesn't for you, it is a bug to report. > Ditto for the fact that, when you have many swap devices set, each > with different performances, there is no way to give priorities/rules > to enforce who uses each device. There are priorities: See swapon(8). It has worked this way from day one (or for as long as I can remember, in any case). The "who gets to use swap and who doesn't" you can control partially via pinning processes to RAM or limiting their memory use. > When someone gets to implement those features, Done already, as far as it makes sense. > this wouldnt be needed > anymore. Case closed. > But that seems far away enough to justify a more immediate > workaround. On some level you /have/ to trust the system to do things right. It has much more detailed information (and better response time) than you could ever hope to get. Besides, adding even more knobs to fiddle just makes the system more complex (and thus bloated/slow) and harder to manage, for a limited gain in niche situations. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 13:43 ` Horst von Brand @ 2006-06-09 15:07 ` Matheus Izvekov 2006-06-09 18:43 ` Lee Revell 2006-06-09 23:37 ` Horst von Brand 0 siblings, 2 replies; 22+ messages in thread From: Matheus Izvekov @ 2006-06-09 15:07 UTC (permalink / raw) To: Horst von Brand; +Cc: Sascha Nitsch, Linux Kernel Mailing List On 6/9/06, Horst von Brand <vonbrand@inf.utfsm.cl> wrote: > Matheus Izvekov <mizvekov@gmail.com> wrote: > > On 6/8/06, Horst von Brand <vonbrand@inf.utfsm.cl> wrote: > > > tmpfs does use swap currently. Giving tmpfs a dedicated swap space is dumb, > > > as it takes away the possibility of using that space for swapping when not > > > in use by tmpfs (and viceversa). > > > The idea is not dumb per se. Maybe you want your applications to swap > > to one device (or not swap at all) and a tmpfs mount to swap to > > another. > > Why? If one device is faster, you'd want to prefer that one for swapping > /and/ tmpfs. If not, I don't see the point. Except for limiting maximal > sizes of tmpfs or swap, but limiting the later doesn't make much sense (why > go OOM even though swap /is/ available?), and the former can be set on mount. > > > For me at least it would make a difference. > > How? > Ok, but reality is that, even if i setup a swap partition with the most lazy swapiness, it will swap my processes out. Is there a pratical way to pin all processes to ram or otherwise tell the vm to never swap any process? If there is, then you are right, there is no point in doing this. > > I dont use swap at all, have enough ram for all my processes. > > What is your beef then? > I just wanted to have no swap for my processes, but i wanted swap for my tmpfs mount, as i explained. For my usage, there is no point in having swap for processes. If something gets to use that much ram, somethings gone wrong, and it should die anyway instead of getting my system unusable for several minutes until swap is full too, and then it dies anyway. > > And ive > > seen that for some workloads, setting a temporary directory as tmpfs > > gives huge speed improvements. But just occasionally, the space used > > in this temp dir will not fit in my ram, so in this case swapping > > would be fine. The problem is, currently there is no way to enforce > > this. > > That is exactly how tmpfs works, and has worked that way from the > beginning. If it doesn't for you, it is a bug to report. > I know it works like this, my point was the separation. > > Ditto for the fact that, when you have many swap devices set, each > > with different performances, there is no way to give priorities/rules > > to enforce who uses each device. > > There are priorities: See swapon(8). It has worked this way from day one > (or for as long as I can remember, in any case). The "who gets to use swap > and who doesn't" you can control partially via pinning processes to RAM or > limiting their memory use. > > > When someone gets to implement those features, > > Done already, as far as it makes sense. Good to know, except that there is no way in the universe the algorithm can be smart enough to be optimal to all usage cases, so some hand fiddling can be desired. > > > this wouldnt be needed > > anymore. > > Case closed. > > > But that seems far away enough to justify a more immediate > > workaround. > > On some level you /have/ to trust the system to do things right. It has > much more detailed information (and better response time) than you could > ever hope to get. Besides, adding even more knobs to fiddle just makes the > system more complex (and thus bloated/slow) and harder to manage, for a > limited gain in niche situations. If it adds so much overhead, it can always be a compile option. The system has many knobs already, its always a compromise. Im not giving any proof that what i described is a good compromise, but at least it can be. > -- > Dr. Horst H. von Brand User #22616 counter.li.org > Departamento de Informatica Fono: +56 32 654431 > Universidad Tecnica Federico Santa Maria +56 32 654239 > Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 15:07 ` Matheus Izvekov @ 2006-06-09 18:43 ` Lee Revell 2006-06-09 19:27 ` Matheus Izvekov 2006-06-09 23:37 ` Horst von Brand 1 sibling, 1 reply; 22+ messages in thread From: Lee Revell @ 2006-06-09 18:43 UTC (permalink / raw) To: Matheus Izvekov; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List On Fri, 2006-06-09 at 12:07 -0300, Matheus Izvekov wrote: > Ok, but reality is that, even if i setup a swap partition with the > most lazy swapiness, it will swap my processes out. Is there a > pratical way to pin all processes to ram or otherwise tell the vm to > never swap any process? If there is, then you are right, there is no > point in doing this. > echo 0 > /proc/sys/vm/swappiness Lee ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 18:43 ` Lee Revell @ 2006-06-09 19:27 ` Matheus Izvekov 2006-06-09 19:31 ` Lee Revell 0 siblings, 1 reply; 22+ messages in thread From: Matheus Izvekov @ 2006-06-09 19:27 UTC (permalink / raw) To: Lee Revell; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List On 6/9/06, Lee Revell <rlrevell@joe-job.com> wrote: > On Fri, 2006-06-09 at 12:07 -0300, Matheus Izvekov wrote: > > Ok, but reality is that, even if i setup a swap partition with the > > most lazy swapiness, it will swap my processes out. Is there a > > pratical way to pin all processes to ram or otherwise tell the vm to > > never swap any process? If there is, then you are right, there is no > > point in doing this. > > > > echo 0 > /proc/sys/vm/swappiness > > Lee Sorry, i took a look at the code which handles this and swappiness = 0 doesnt seem to imply that process memory will never be swapped out. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 19:27 ` Matheus Izvekov @ 2006-06-09 19:31 ` Lee Revell 2006-06-09 19:43 ` Matheus Izvekov 0 siblings, 1 reply; 22+ messages in thread From: Lee Revell @ 2006-06-09 19:31 UTC (permalink / raw) To: Matheus Izvekov; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List On Fri, 2006-06-09 at 16:27 -0300, Matheus Izvekov wrote: > On 6/9/06, Lee Revell <rlrevell@joe-job.com> wrote: > > On Fri, 2006-06-09 at 12:07 -0300, Matheus Izvekov wrote: > > > Ok, but reality is that, even if i setup a swap partition with the > > > most lazy swapiness, it will swap my processes out. Is there a > > > pratical way to pin all processes to ram or otherwise tell the vm to > > > never swap any process? If there is, then you are right, there is no > > > point in doing this. > > > > > > > echo 0 > /proc/sys/vm/swappiness > > > > Lee > > Sorry, i took a look at the code which handles this and swappiness = 0 > doesnt seem to imply that process memory will never be swapped out. > OK, then use mlockall(). Lee ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 19:31 ` Lee Revell @ 2006-06-09 19:43 ` Matheus Izvekov 2006-06-09 20:03 ` Lee Revell 0 siblings, 1 reply; 22+ messages in thread From: Matheus Izvekov @ 2006-06-09 19:43 UTC (permalink / raw) To: Lee Revell; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List On 6/9/06, Lee Revell <rlrevell@joe-job.com> wrote: > On Fri, 2006-06-09 at 16:27 -0300, Matheus Izvekov wrote: > > Sorry, i took a look at the code which handles this and swappiness = 0 > > doesnt seem to imply that process memory will never be swapped out. > > > > OK, then use mlockall(). > > Lee > > If i make init mlockall, would all child processes be mlocked too? If not, using this to enforce a system wide policy seems a bit hacky and non trivial. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 19:43 ` Matheus Izvekov @ 2006-06-09 20:03 ` Lee Revell 2006-06-09 21:23 ` Matheus Izvekov 0 siblings, 1 reply; 22+ messages in thread From: Lee Revell @ 2006-06-09 20:03 UTC (permalink / raw) To: Matheus Izvekov; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List On Fri, 2006-06-09 at 16:43 -0300, Matheus Izvekov wrote: > On 6/9/06, Lee Revell <rlrevell@joe-job.com> wrote: > > On Fri, 2006-06-09 at 16:27 -0300, Matheus Izvekov wrote: > > > Sorry, i took a look at the code which handles this and swappiness = 0 > > > doesnt seem to imply that process memory will never be swapped out. > > > > > > > OK, then use mlockall(). > > > > Lee > > > > > > If i make init mlockall, would all child processes be mlocked too? No. > If not, using this to enforce a system wide policy seems a bit hacky > and non trivial. > Well, what you are trying to do seems hacky. What real world problem are you trying to solve that setting swappiness to 0 is not sufficient for? Lee ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 20:03 ` Lee Revell @ 2006-06-09 21:23 ` Matheus Izvekov 0 siblings, 0 replies; 22+ messages in thread From: Matheus Izvekov @ 2006-06-09 21:23 UTC (permalink / raw) To: Lee Revell; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List On 6/9/06, Lee Revell <rlrevell@joe-job.com> wrote: > Well, what you are trying to do seems hacky. What real world problem > are you trying to solve that setting swappiness to 0 is not sufficient > for? > > Lee For my usage, having processes swap is a complete loss. I have enough ram and if some process doesnt fits into ram i would rather have it killed than have it swap. Swap activity hogs my system and probably either the process would fill up the swap and die anyway or it would be too slow to be usable and i would kill it. But i have some processes which gain considerable performance benefit it they do their temporary work over tmpfs. The problem is that just sometimes, their temporary work just doesnt fit into ram. In that case swapping would be just fine. The simple dataformat on disk is a gain when the stuff you are working on doesnt need to survive unmounting/powerloss. Now ive considered two alternatives: 1) Creating a new filesystem, a very simple one which only stores the data in disk, and all the other stuff (superblock etc) is kept in kernel memory, and let pagecache do its work on keeping fresh stuff on ram. 2) Modifying tmpfs to accept a device, and when things dont fit in ram, they would be flushed to this device instead first, while there is space availiable, and then ultimately revert to swap when its present. It seems to me that both approaches would converge to the same thing, but 2 is better because there would be no functionality duplication, and it would get to keep the cool name ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-09 15:07 ` Matheus Izvekov 2006-06-09 18:43 ` Lee Revell @ 2006-06-09 23:37 ` Horst von Brand 1 sibling, 0 replies; 22+ messages in thread From: Horst von Brand @ 2006-06-09 23:37 UTC (permalink / raw) To: Matheus Izvekov; +Cc: Horst von Brand, Sascha Nitsch, Linux Kernel Mailing List Matheus Izvekov <mizvekov@gmail.com> wrote: > On 6/9/06, Horst von Brand <vonbrand@inf.utfsm.cl> wrote: > > Matheus Izvekov <mizvekov@gmail.com> wrote: > > > On 6/8/06, Horst von Brand <vonbrand@inf.utfsm.cl> wrote: > > > > tmpfs does use swap currently. Giving tmpfs a dedicated swap space > > > > is dumb, as it takes away the possibility of using that space for > > > > swapping when not in use by tmpfs (and viceversa). > > > The idea is not dumb per se. Maybe you want your applications to swap > > > to one device (or not swap at all) and a tmpfs mount to swap to > > > another. > > Why? If one device is faster, you'd want to prefer that one for > > swapping /and/ tmpfs. If not, I don't see the point. Except for > > limiting maximal sizes of tmpfs or swap, but limiting the later doesn't > > make much sense (why go OOM even though swap /is/ available?), and the > > former can be set on mount. > > > For me at least it would make a difference. > > How? > Ok, but reality is that, even if i setup a swap partition with the > most lazy swapiness, it will swap my processes out. When you run out of RAM, or the RAM can be put to better use than keeping stale process data around (you do realize that program code is paged directly from the executable, don't you?). > Is there a > pratical way to pin all processes to ram or otherwise tell the vm to > never swap any process? If there is, then you are right, there is no > point in doing this. There is: Max out the RAM on your machine. Don't ever run large processes. Don't ever read large files. > > > I dont use swap at all, have enough ram for all my processes. > > What is your beef then? > I just wanted to have no swap for my processes, but i wanted swap for > my tmpfs mount, as i explained. Use a regular filesystem for /tmp, Linux is pretty good at caching file data. If it isn't enough, complain /with data/ and /details/ of how it isn't enough... > For my usage, there is no point in > having swap for processes. If something gets to use that much ram, > somethings gone wrong, and it should die anyway instead of getting my > system unusable for several minutes until swap is full too, and then > it dies anyway. OK. But in any case, this is rare? So it would make not that much of a difference... > > > And ive > > > seen that for some workloads, setting a temporary directory as tmpfs > > > gives huge speed improvements. But just occasionally, the space used > > > in this temp dir will not fit in my ram, so in this case swapping > > > would be fine. The problem is, currently there is no way to enforce > > > this. > > That is exactly how tmpfs works, and has worked that way from the > > beginning. If it doesn't for you, it is a bug to report. > I know it works like this, my point was the separation. Again, I fail to see the point. > > > Ditto for the fact that, when you have many swap devices set, each > > > with different performances, there is no way to give priorities/rules > > > to enforce who uses each device. > > > > There are priorities: See swapon(8). It has worked this way from day one > > (or for as long as I can remember, in any case). The "who gets to use swap > > and who doesn't" you can control partially via pinning processes to RAM or > > limiting their memory use. > > > > > When someone gets to implement those features, > > > > Done already, as far as it makes sense. > Good to know, except that there is no way in the universe the > algorithm can be smart enough to be optimal to all usage cases, Right. > so > some hand fiddling can be desired. Desired, yes; but not all desires can (or deserve to) be granted... > > > this wouldnt be needed > > > anymore. > > Case closed. > > > But that seems far away enough to justify a more immediate > > > workaround. > > On some level you /have/ to trust the system to do things right. It has > > much more detailed information (and better response time) than you could > > ever hope to get. Besides, adding even more knobs to fiddle just makes the > > system more complex (and thus bloated/slow) and harder to manage, for a > > limited gain in niche situations. > If it adds so much overhead, it can always be a compile option. The overhead is not "just" runtime overhead, it is also developer time consumption, it is more complex testing (need to check it works with/without), ... > The > system has many knobs already, its always a compromise. Im not giving > any proof that what i described is a good compromise, but at least it > can be. I've given no proof that "just another knob" is a bad idea either, but the road to massive suckage is paved with "just another little feature"... -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Idea about a disc backed ram filesystem 2006-06-08 20:33 ` Idea about a disc backed ram filesystem Sascha Nitsch ` (2 preceding siblings ...) 2006-06-08 22:48 ` Matheus Izvekov @ 2006-06-09 6:33 ` Andi Kleen 3 siblings, 0 replies; 22+ messages in thread From: Andi Kleen @ 2006-06-09 6:33 UTC (permalink / raw) To: Sascha Nitsch; +Cc: linux-kernel Sascha Nitsch <Sash_lkl@linuxhowtos.org> writes: > - all files overlayed are still accessible > - after the first read, the file stays in memory (like a file cache) Linux has very aggressive file caching and does this effectively by default for every file system. Sounds like you're trying to reinvent the wheel. -Andi ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2006-06-09 23:38 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Sash_lkl@linuxhowtos.org>
2006-06-08 20:33 ` Idea about a disc backed ram filesystem Sascha Nitsch
2006-06-08 20:43 ` Lennart Sorensen
2006-06-08 21:12 ` Sash
2006-06-09 10:45 ` Jan Engelhardt
2006-06-08 21:51 ` Horst von Brand
2006-06-08 22:39 ` Joshua Hudson
2006-06-08 22:48 ` Matheus Izvekov
2006-06-08 23:40 ` Måns Rullgård
2006-06-09 1:01 ` Matheus Izvekov
2006-06-09 8:52 ` Måns Rullgård
2006-06-09 2:17 ` Horst von Brand
2006-06-09 4:59 ` Matheus Izvekov
2006-06-09 13:43 ` Horst von Brand
2006-06-09 15:07 ` Matheus Izvekov
2006-06-09 18:43 ` Lee Revell
2006-06-09 19:27 ` Matheus Izvekov
2006-06-09 19:31 ` Lee Revell
2006-06-09 19:43 ` Matheus Izvekov
2006-06-09 20:03 ` Lee Revell
2006-06-09 21:23 ` Matheus Izvekov
2006-06-09 23:37 ` Horst von Brand
2006-06-09 6:33 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox