* Fwd: (un)loadable module support for zcache
[not found] <CABv5NL-SquBQH8W+K1CXNBQQWqHyYO+p3Y9sPqsbfZKp5EafTg@mail.gmail.com>
@ 2012-03-05 0:46 ` Ilendir
2012-03-05 16:57 ` Dan Magenheimer
1 sibling, 0 replies; 6+ messages in thread
From: Ilendir @ 2012-03-05 0:46 UTC (permalink / raw)
To: linux-mm; +Cc: ngupta
While experimenting with zcache on various systems, we discovered what
seems to be a different impact on CPU and power consumption, varying
from system to system and workload. While there has been some research
effort about the effect of on-line memory compression on power
consumption [1], the trade-off, for example when using SSDs or on
mobile platforms (e.g. Android), remains still unclear. Therefore it
would be desirable to improve the possibilities to study this effects
on the example of zcache. But zcache is missing an important feature:
dynamic disabling and enabling. This is a big obstacle for further
analysis.
Since we have to do some free-to-choose work on a Linux related topic
while doing an internship at the University in Erlangen, we'd like to
implement this feature.
Moreover, if we achieve our goal, the way to an unloadable zcache
module isn’t far way. If that is accomplished, one of the blockers to
get zcache out of the staging tree is gone.
Any advice is appreciated.
Florian Schmaus
Stefan Hengelein
Andor Daam
[1] http://ziyang.eecs.umich.edu/~dickrp/publications/yang-crames-tecs.pdf
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: (un)loadable module support for zcache
[not found] <CABv5NL-SquBQH8W+K1CXNBQQWqHyYO+p3Y9sPqsbfZKp5EafTg@mail.gmail.com>
2012-03-05 0:46 ` Fwd: (un)loadable module support for zcache Ilendir
@ 2012-03-05 16:57 ` Dan Magenheimer
2012-03-08 14:36 ` Florian Schmaus
1 sibling, 1 reply; 6+ messages in thread
From: Dan Magenheimer @ 2012-03-05 16:57 UTC (permalink / raw)
To: Ilendir, linux-mm
Cc: sjenning, Konrad Wilk, fschmaus, Andor Daam, i4passt, devel,
Nitin Gupta
> From: Ilendir [mailto:ilendir@googlemail.com]
> Subject: (un)loadable module support for zcache
>
> While experimenting with zcache on various systems, we discovered what
> seems to be a different impact on CPU and power consumption, varying
> from system to system and workload. While there has been some research
> effort about the effect of on-line memory compression on power
> consumption [1], the trade-off, for example when using SSDs or on
> mobile platforms (e.g. Android), remains still unclear. Therefore it
> would be desirable to improve the possibilities to study this effects
> on the example of zcache. But zcache is missing an important feature:
> dynamic disabling and enabling. This is a big obstacle for further
> analysis.
> Since we have to do some free-to-choose work on a Linux related topic
> while doing an internship at the University in Erlangen, we'd like to
> implement this feature.
>
> Moreover, if we achieve our goal, the way to an unloadable zcache
> module isn't far way. If that is accomplished, one of the blockers to
> get zcache out of the staging tree is gone.
>
> Any advice is appreciated.
>
> Florian Schmaus
> Stefan Hengelein
> Andor Daam
Hi Florian, Stefan, and Andor --
Thanks for your interest in zcache development!
I see you've sent your original email separately to different lists
so I will try to combine them into one cc list now so hopefully
there will be one thread.
Your idea of studying power consumption tradeoffs is interesting
and the work to allow zcache to be installed as a module will
also be very useful.
I have given some thought on what would be necessary to allow
zcache (or Xen tmem, or RAMster) to be insmod'ed and rmmod'ed.
There are two main technical difficulties that I see. There
may be more but let's start with these two.
First, the "tmem frontend" code in cleancache and frontswap
assumes that a "tmem backend" (such as zcache, Xen tmem, or
RAMster) has already registered when filesystems are mounted
(for cleancache) and when swapon is run (for frontswap).
If no tmem backend has yet registered when the mount (or swapon)
is invoked, then cleancache_enabled (or frontswap_enabled) has
not been set to 1, and the corresponding init_fs/init routine
has not been called and no tmem "pool" gets created.
Then if zcache later registers with cleancache/frontend, it
is too late... there are no mounts or swapons to trigger the
calls that create the tmem pools. As result, all gets and
puts and flushes will fail, and zcache does not work.
I think the answer here is for cleancache (and frontswap) to
support "lazy pool creation". If a backend has not yet
registered when an init_fs/init call is made, cleancache
(or frontswap) must record the attempt and generate a valid
"fake poolid" to return. Any calls to put/get/flush with
a fake poolid is ignored as the zcache module is not
yet loaded. Later, when zcache is insmod'ed, it will attempt
to register and cleancache must then call the init_fs/init
routines (to "lazily" create the pools), obtain a "real poolid"
from zcache for each pool and "map" the fake poolid to the real
poolid on EVERY get/put/flush and on pool destroy (umount/swapoff).
I think all changes for this will be in mm/cleancache.c and
mm/frontswap.c... the backend does not need to know anything
about it.
This implementation will not be hard, but there may be a few
corner cases that you will need to ensure are correct, and
of course you will need to ensure that any coding changes follow
proper Linux coding styles.
Second issue: When zcache gets rmmod'ed, there is an issue of
coherency. You need to ensure that if zcache goes through
insmod -> rmmod -> insmod
that no stale data remains in any tmem pool. If any
stale data remains, a "get" of the old data may result in
data corruption.
The problem is that there may be millions of pages in
cleancache and flushing those pages may take a very long
time. The user will not want to wait that long. And
for frontswap, frontswap_shrink must be called and since
every page in frontswap contains real user data, you must
ensure that all pages get decompressed and removed from
frontswap either into physical RAM or a physical swap disk.
(See frontswap_shrink in frontswap.c and frontswap_selfshrink
in the RAMster code.) This may take a very VERY long time.
So rmmod cannot complete until all the data in cleancache
is freed and all the data in frontswap is repatriated to RAM
or swap disk.
I don't have an easy answer for this one. It may be possible
to have "zombie" lists of partially destroyed pages and a
kernel thread that (after rmmod completes) walks the list and
frees or frontswap_shrinks the pages. I will leave this
to you to solve... it is likely the hardest problem for
making zcache work as a module. If you can't get it to work,
it would still be useful to be able to "insmod" zcache,
even if "rmmod" is not possible.
Thanks,
Dan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (un)loadable module support for zcache
2012-03-05 16:57 ` Dan Magenheimer
@ 2012-03-08 14:36 ` Florian Schmaus
2012-03-08 15:52 ` Dan Magenheimer
0 siblings, 1 reply; 6+ messages in thread
From: Florian Schmaus @ 2012-03-08 14:36 UTC (permalink / raw)
To: Dan Magenheimer, linux-mm
Cc: Stefan Hengelein, sjenning, Konrad Wilk, Andor Daam, i4passt,
devel, Nitin Gupta
On 03/05/12 17:57, Dan Magenheimer wrote:
> I think the answer here is for cleancache (and frontswap) to
> support "lazy pool creation". If a backend has not yet
> registered when an init_fs/init call is made, cleancache
> (or frontswap) must record the attempt and generate a valid
> "fake poolid" to return. Any calls to put/get/flush with
> a fake poolid is ignored as the zcache module is not
> yet loaded. Later, when zcache is insmod'ed, it will attempt
> to register and cleancache must then call the init_fs/init
> routines (to "lazily" create the pools), obtain a "real poolid"
> from zcache for each pool and "map" the fake poolid to the real
> poolid on EVERY get/put/flush and on pool destroy (umount/swapoff).
We were thinking about how to make cleancache and frontswap able to cope
with the mounting of filesystems and running of swapon when there is no
backend registered without adding an indirection caused by a fake pool
id map.
We figured a way to deal with this in cleancache would be to store the
struct super_block pointers in an array for every call to init_fs and
the uuids and struct super_blocks pointers in different arrays for every
call to init_shared_fs. When a filesystem unmounts before a backend is
registered, its entries in the respective arrays are removed.
While no backend is registered, the put_page() and invalidate_page() are
ignored and get_page() fails. As soon as a backend registers the init_fs
and init_shared_fs functions are called for the struct super_block
pointers (and uuids) stored in the according arrays.
For frontswap we are aiming for a similar approach by remembering the
types for every call to init and failing put_page() and ignoring
get_page() and invalidate_page().
Again, when a backend registers init is called for every type stored.
This should allow backends to register with cleancache and frontswap
even after the mounting of filesystems and/or swapon is run. Therefore
it should allow zcache to be insmodded. This would be a first step to
allow rmmodding of zcache aswell.
Is this approach feasible?
Stefan, Florian, and Andor
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: (un)loadable module support for zcache
2012-03-08 14:36 ` Florian Schmaus
@ 2012-03-08 15:52 ` Dan Magenheimer
2012-03-08 16:51 ` Andor Daam
0 siblings, 1 reply; 6+ messages in thread
From: Dan Magenheimer @ 2012-03-08 15:52 UTC (permalink / raw)
To: Florian Schmaus, linux-mm
Cc: Stefan Hengelein, sjenning, Konrad Wilk, Andor Daam, i4passt,
devel, Nitin Gupta
> From: Florian Schmaus [mailto:fschmaus@gmail.com]
> Subject: Re: (un)loadable module support for zcache
>
> On 03/05/12 17:57, Dan Magenheimer wrote:
> > I think the answer here is for cleancache (and frontswap) to
> > support "lazy pool creation". If a backend has not yet
> > registered when an init_fs/init call is made, cleancache
> > (or frontswap) must record the attempt and generate a valid
> > "fake poolid" to return. Any calls to put/get/flush with
> > a fake poolid is ignored as the zcache module is not
> > yet loaded. Later, when zcache is insmod'ed, it will attempt
> > to register and cleancache must then call the init_fs/init
> > routines (to "lazily" create the pools), obtain a "real poolid"
> > from zcache for each pool and "map" the fake poolid to the real
> > poolid on EVERY get/put/flush and on pool destroy (umount/swapoff).
>
> We were thinking about how to make cleancache and frontswap able to cope
> with the mounting of filesystems and running of swapon when there is no
> backend registered without adding an indirection caused by a fake pool
> id map.
>
> We figured a way to deal with this in cleancache would be to store the
> struct super_block pointers in an array for every call to init_fs and
> the uuids and struct super_blocks pointers in different arrays for every
> call to init_shared_fs. When a filesystem unmounts before a backend is
> registered, its entries in the respective arrays are removed.
> While no backend is registered, the put_page() and invalidate_page() are
> ignored and get_page() fails. As soon as a backend registers the init_fs
> and init_shared_fs functions are called for the struct super_block
> pointers (and uuids) stored in the according arrays.
>
> For frontswap we are aiming for a similar approach by remembering the
> types for every call to init and failing put_page() and ignoring
> get_page() and invalidate_page().
> Again, when a backend registers init is called for every type stored.
>
> This should allow backends to register with cleancache and frontswap
> even after the mounting of filesystems and/or swapon is run. Therefore
> it should allow zcache to be insmodded. This would be a first step to
> allow rmmodding of zcache aswell.
>
> Is this approach feasible?
Hi Stefan, Florian, and Andor --
I do see a potential problem with this approach. You would
be saving a superblock pointer and then using it later. What
if the filesystem was unmounted in the meantime? Or, worse,
what if it was unmounted and then the address of the superblock
is reused to point to some completely different object?
I think if you ensure that cleancache_invalidate_fs() is always
called when a cleancache-enabled filesystem is unmounted,
then in cleancache_invalidate_fs() you remove the matching
superblock pointer from your arrays, then it should work.
Dan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (un)loadable module support for zcache
2012-03-08 15:52 ` Dan Magenheimer
@ 2012-03-08 16:51 ` Andor Daam
2012-03-08 17:07 ` Dan Magenheimer
0 siblings, 1 reply; 6+ messages in thread
From: Andor Daam @ 2012-03-08 16:51 UTC (permalink / raw)
To: Dan Magenheimer
Cc: Florian Schmaus, linux-mm, Stefan Hengelein, sjenning,
Konrad Wilk, i4passt, devel, Nitin Gupta
2012/3/8 Dan Magenheimer <dan.magenheimer@oracle.com>
>
> > From: Florian Schmaus [mailto:fschmaus@gmail.com]
> > Subject: Re: (un)loadable module support for zcache
> >
> > On 03/05/12 17:57, Dan Magenheimer wrote:
> > > I think the answer here is for cleancache (and frontswap) to
> > > support "lazy pool creation". If a backend has not yet
> > > registered when an init_fs/init call is made, cleancache
> > > (or frontswap) must record the attempt and generate a valid
> > > "fake poolid" to return. Any calls to put/get/flush with
> > > a fake poolid is ignored as the zcache module is not
> > > yet loaded. Later, when zcache is insmod'ed, it will attempt
> > > to register and cleancache must then call the init_fs/init
> > > routines (to "lazily" create the pools), obtain a "real poolid"
> > > from zcache for each pool and "map" the fake poolid to the real
> > > poolid on EVERY get/put/flush and on pool destroy (umount/swapoff).
> >
> > We were thinking about how to make cleancache and frontswap able to cope
> > with the mounting of filesystems and running of swapon when there is no
> > backend registered without adding an indirection caused by a fake pool
> > id map.
> >
> > We figured a way to deal with this in cleancache would be to store the
> > struct super_block pointers in an array for every call to init_fs and
> > the uuids and struct super_blocks pointers in different arrays for every
> > call to init_shared_fs. When a filesystem unmounts before a backend is
> > registered, its entries in the respective arrays are removed.
> > While no backend is registered, the put_page() and invalidate_page() are
> > ignored and get_page() fails. As soon as a backend registers the init_fs
> > and init_shared_fs functions are called for the struct super_block
> > pointers (and uuids) stored in the according arrays.
> >
> > For frontswap we are aiming for a similar approach by remembering the
> > types for every call to init and failing put_page() and ignoring
> > get_page() and invalidate_page().
> > Again, when a backend registers init is called for every type stored.
> >
> > This should allow backends to register with cleancache and frontswap
> > even after the mounting of filesystems and/or swapon is run. Therefore
> > it should allow zcache to be insmodded. This would be a first step to
> > allow rmmodding of zcache aswell.
> >
> > Is this approach feasible?
>
> Hi Stefan, Florian, and Andor --
>
> I do see a potential problem with this approach. You would
> be saving a superblock pointer and then using it later. What
> if the filesystem was unmounted in the meantime? Or, worse,
> what if it was unmounted and then the address of the superblock
> is reused to point to some completely different object?
>
> I think if you ensure that cleancache_invalidate_fs() is always
> called when a cleancache-enabled filesystem is unmounted,
> then in cleancache_invalidate_fs() you remove the matching
> superblock pointer from your arrays, then it should work.
>
> Dan
We already thought of removing the matching pointer, whenever a filesystem is
unmounted.
As the comment to __cleancache_invalidate_fs in cleancache.c states
that this function
is called by any cleancache-enabled filesystem at time of unmount, we
assumed that this function was actually always called upon unmount.
Is it not certain that this function is always called?
Andor
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: (un)loadable module support for zcache
2012-03-08 16:51 ` Andor Daam
@ 2012-03-08 17:07 ` Dan Magenheimer
0 siblings, 0 replies; 6+ messages in thread
From: Dan Magenheimer @ 2012-03-08 17:07 UTC (permalink / raw)
To: Andor Daam
Cc: Florian Schmaus, linux-mm, Stefan Hengelein, sjenning,
Konrad Wilk, i4passt, devel, Nitin Gupta
> From: Andor Daam [mailto:andor.daam@googlemail.com]
> Subject: Re: (un)loadable module support for zcache
>
> 2012/3/8 Dan Magenheimer <dan.magenheimer@oracle.com>
> >
> > > From: Florian Schmaus [mailto:fschmaus@gmail.com]
> > > Subject: Re: (un)loadable module support for zcache
> > >
> > > This should allow backends to register with cleancache and frontswap
> > > even after the mounting of filesystems and/or swapon is run. Therefore
> > > it should allow zcache to be insmodded. This would be a first step to
> > > allow rmmodding of zcache aswell.
> > >
> > > Is this approach feasible?
> >
> > Hi Stefan, Florian, and Andor --
> >
> > I do see a potential problem with this approach. You would
> > be saving a superblock pointer and then using it later. What
> > if the filesystem was unmounted in the meantime? Or, worse,
> > what if it was unmounted and then the address of the superblock
> > is reused to point to some completely different object?
> >
> > I think if you ensure that cleancache_invalidate_fs() is always
> > called when a cleancache-enabled filesystem is unmounted,
> > then in cleancache_invalidate_fs() you remove the matching
> > superblock pointer from your arrays, then it should work.
>
> We already thought of removing the matching pointer, whenever a filesystem is
> unmounted.
Great!
> As the comment to __cleancache_invalidate_fs in cleancache.c states
> that this function
> is called by any cleancache-enabled filesystem at time of unmount, we
> assumed that this function was actually always called upon unmount.
Hi Andor --
Until now, cleancache_invalidate_fs was only called for garbage
collection so it didn't really matter. Since, after you work is
done, a missed call to cleancache_invalidate_fs has the potential
to cause data corruption, it's probably best to be paranoid
and verify.
> Is it not certain that this function is always called?
I *think* it should always be called, but I am not a filesystem expert.
It might be worth asking the question on a filesytem mailing list
(or on the individual lists for ext3/4, ocfs2, btrfs): "Is it
ever possible for a superblock for a mounted filesystem to be free'd
without a previous call to unmount the filesystem?" And you might
want to check the call points for cleancache_invalidate_fs (in each
of the filesystems) to see if there are error conditions which
would skip the call to cleancache_invalidate_fs.
Alternately, if you generate and keep track of a "fake pool id"
and map it (after the backend registers) to a real pool id,
I think there's no risk. However, I agree your solution is
more elegant so as long as you verify that there is no chance
of data corruption, I am OK with your solution.
Dan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-03-08 17:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CABv5NL-SquBQH8W+K1CXNBQQWqHyYO+p3Y9sPqsbfZKp5EafTg@mail.gmail.com>
2012-03-05 0:46 ` Fwd: (un)loadable module support for zcache Ilendir
2012-03-05 16:57 ` Dan Magenheimer
2012-03-08 14:36 ` Florian Schmaus
2012-03-08 15:52 ` Dan Magenheimer
2012-03-08 16:51 ` Andor Daam
2012-03-08 17:07 ` Dan Magenheimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).