* Containers and /proc/sys/vm/drop_caches
@ 2011-01-05 9:40 Mike Hommey
[not found] ` <20110105094022.GA5366-YmoObPS1fuhg9hUCZPvPmw@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Mike Hommey @ 2011-01-05 9:40 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
[Copy/pasted from a previous message to lkml, where it was suggested to
try containers@]
Hi,
I noticed that from within a lxc container, writing "3" to
/proc/sys/vm/drop_caches would flush the host page cache. That sounds a
little dangerous for VPS offerings that would be based on lxc, as in one
VPS instance root user could impact the overall performance of the host.
I don't know about other containers but I've been told openvz isn't
subject to this problem.
I only tested the current Debian Squeeze kernel, which is based on
2.6.32.27.
Cheers,
Mike
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <20110105094022.GA5366-YmoObPS1fuhg9hUCZPvPmw@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <20110105094022.GA5366-YmoObPS1fuhg9hUCZPvPmw@public.gmane.org> @ 2011-01-05 9:49 ` Daniel Lezcano [not found] ` <4D243EC3.1050101-GANU6spQydw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Daniel Lezcano @ 2011-01-05 9:49 UTC (permalink / raw) To: Mike Hommey; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On 01/05/2011 10:40 AM, Mike Hommey wrote: > [Copy/pasted from a previous message to lkml, where it was suggested to > try containers@] > > Hi, > > I noticed that from within a lxc container, writing "3" to > /proc/sys/vm/drop_caches would flush the host page cache. That sounds a > little dangerous for VPS offerings that would be based on lxc, as in one > VPS instance root user could impact the overall performance of the host. > I don't know about other containers but I've been told openvz isn't > subject to this problem. > I only tested the current Debian Squeeze kernel, which is based on > 2.6.32.27. There is definitively a big work to do with /proc. Some files should be not accessible (/proc/sys/vm/drop_caches, /proc/sys/kernel/sysrq, ...) and some other should be virtualized (/proc/meminfo, /proc/cpuinfo, ...). Serge suggested to create something similar to the cgroup device whitelist but for /proc, maybe it is a good approach for denying access a specific proc's file. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <4D243EC3.1050101-GANU6spQydw@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <4D243EC3.1050101-GANU6spQydw@public.gmane.org> @ 2011-01-05 14:01 ` Serge Hallyn [not found] ` <20110105140159.GC2718-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Serge Hallyn @ 2011-01-05 14:01 UTC (permalink / raw) To: Daniel Lezcano Cc: Mike Hommey, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > On 01/05/2011 10:40 AM, Mike Hommey wrote: > >[Copy/pasted from a previous message to lkml, where it was suggested to > > try containers@] > > > >Hi, > > > >I noticed that from within a lxc container, writing "3" to > >/proc/sys/vm/drop_caches would flush the host page cache. That sounds a > >little dangerous for VPS offerings that would be based on lxc, as in one > >VPS instance root user could impact the overall performance of the host. > >I don't know about other containers but I've been told openvz isn't > >subject to this problem. > >I only tested the current Debian Squeeze kernel, which is based on > >2.6.32.27. > > There is definitively a big work to do with /proc. > > Some files should be not accessible (/proc/sys/vm/drop_caches, > /proc/sys/kernel/sysrq, ...) and some other should be virtualized > (/proc/meminfo, /proc/cpuinfo, ...). > > Serge suggested to create something similar to the cgroup device > whitelist but for /proc, maybe it is a good approach for denying > access a specific proc's file. Long-term, user namespaces should fix this - /proc will be owned by the user namespace which mounted it, but we can tell proc to always have some files (like drop_caches) be owned by init_user_ns. I'm hoping to push my final targeted capabilities prototype in the next few weeks, and after that I start seriously attacking VFS interaction. In the meantime, though, you can use SELinux/Smack, or a custom cgroup file does sound useful. Can cgroups be modules nowadays? (I can't keep up) If so, an out of tree proc-cgroup module seems like a good interim solution. -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20110105140159.GC2718-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <20110105140159.GC2718-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> @ 2011-01-05 14:16 ` Balbir Singh [not found] ` <AANLkTi=x=6gUZTxJC8LXxYNu029+firyzKqjMa6m+R-x-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Balbir Singh @ 2011-01-05 14:16 UTC (permalink / raw) To: Serge Hallyn Cc: Mike Hommey, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On Wed, Jan 5, 2011 at 7:31 PM, Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: > Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): >> On 01/05/2011 10:40 AM, Mike Hommey wrote: >> >[Copy/pasted from a previous message to lkml, where it was suggested to >> > try containers@] >> > >> >Hi, >> > >> >I noticed that from within a lxc container, writing "3" to >> >/proc/sys/vm/drop_caches would flush the host page cache. That sounds a >> >little dangerous for VPS offerings that would be based on lxc, as in one >> >VPS instance root user could impact the overall performance of the host. >> >I don't know about other containers but I've been told openvz isn't >> >subject to this problem. >> >I only tested the current Debian Squeeze kernel, which is based on >> >2.6.32.27. >> >> There is definitively a big work to do with /proc. >> >> Some files should be not accessible (/proc/sys/vm/drop_caches, >> /proc/sys/kernel/sysrq, ...) and some other should be virtualized >> (/proc/meminfo, /proc/cpuinfo, ...). >> >> Serge suggested to create something similar to the cgroup device >> whitelist but for /proc, maybe it is a good approach for denying >> access a specific proc's file. > > Long-term, user namespaces should fix this - /proc will be owned > by the user namespace which mounted it, but we can tell proc to > always have some files (like drop_caches) be owned by init_user_ns. > > I'm hoping to push my final targeted capabilities prototype in the > next few weeks, and after that I start seriously attacking VFS > interaction. > > In the meantime, though, you can use SELinux/Smack, or a custom > cgroup file does sound useful. Can cgroups be modules nowadays? > (I can't keep up) If so, an out of tree proc-cgroup module seems > like a good interim solution. > Ideally a drop_cache should drop page cache in that container, but given container have a lot of shared page cache, what is suggested might be a good way to work around the problem Balbir ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <AANLkTi=x=6gUZTxJC8LXxYNu029+firyzKqjMa6m+R-x-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <AANLkTi=x=6gUZTxJC8LXxYNu029+firyzKqjMa6m+R-x-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-01-06 21:43 ` Matt Helsley [not found] ` <20110106214315.GJ29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Matt Helsley @ 2011-01-06 21:43 UTC (permalink / raw) To: Balbir Singh Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mike Hommey On Wed, Jan 05, 2011 at 07:46:17PM +0530, Balbir Singh wrote: > On Wed, Jan 5, 2011 at 7:31 PM, Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: > > Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > >> On 01/05/2011 10:40 AM, Mike Hommey wrote: > >> >[Copy/pasted from a previous message to lkml, where it was suggested to > >> > try containers@] > >> > > >> >Hi, > >> > > >> >I noticed that from within a lxc container, writing "3" to > >> >/proc/sys/vm/drop_caches would flush the host page cache. That sounds a > >> >little dangerous for VPS offerings that would be based on lxc, as in one > >> >VPS instance root user could impact the overall performance of the host. > >> >I don't know about other containers but I've been told openvz isn't > >> >subject to this problem. > >> >I only tested the current Debian Squeeze kernel, which is based on > >> >2.6.32.27. > >> > >> There is definitively a big work to do with /proc. > >> > >> Some files should be not accessible (/proc/sys/vm/drop_caches, > >> /proc/sys/kernel/sysrq, ...) and some other should be virtualized > >> (/proc/meminfo, /proc/cpuinfo, ...). > >> > >> Serge suggested to create something similar to the cgroup device > >> whitelist but for /proc, maybe it is a good approach for denying > >> access a specific proc's file. > > > > Long-term, user namespaces should fix this - /proc will be owned > > by the user namespace which mounted it, but we can tell proc to > > always have some files (like drop_caches) be owned by init_user_ns. > > > > I'm hoping to push my final targeted capabilities prototype in the > > next few weeks, and after that I start seriously attacking VFS > > interaction. > > > > In the meantime, though, you can use SELinux/Smack, or a custom > > cgroup file does sound useful. Can cgroups be modules nowadays? > > (I can't keep up) If so, an out of tree proc-cgroup module seems > > like a good interim solution. > > > > Ideally a drop_cache should drop page cache in that container, but > given container have a lot of shared page cache, what is suggested > might be a good way to work around the problem One gross hack that comes to mind: Instead of a hard permission model limit the frequency with which the container could actually drop caches. Then the container's ability to interfere with host performance is more limited (but still non-zero). Or limit frequency on a per-user basis (more like Serge's design) because running more containers by a compromised user account shouldn't allow more frequent cache dropping. That said, the more important question is why should we provide drop_caches inside a container? My understanding is it's largely a workload-debugging tool and not something meant to truly solve problems. If that's the case then we shouldn't provide it at all or it should actually interfere with the host cache. Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20110106214315.GJ29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <20110106214315.GJ29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org> @ 2011-01-06 21:50 ` Dave Hansen 2011-01-06 22:08 ` Matt Helsley 2011-01-07 13:03 ` Rob Landley 1 sibling, 1 reply; 14+ messages in thread From: Dave Hansen @ 2011-01-06 21:50 UTC (permalink / raw) To: Matt Helsley Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mike Hommey, Balbir Singh On Thu, 2011-01-06 at 13:43 -0800, Matt Helsley wrote: > That said, the more important question is why should we provide > drop_caches inside a container? My understanding is it's largely a > workload-debugging tool and not something meant to truly solve > problems. If that's the case then we shouldn't provide it at all or it > should actually interfere with the host cache. Yeah, what's the problem that you're solving with drop_caches? The odds are, there's a better way. That said, it _might_ be worth doing things like dropping (inode or dentry) caches per-sb. That's a much better fit than using big, ugly, loosely-defined, system-wide knobs like drop_caches. Also, unless we start giving containers real ownership of devices or partitions, it's going to be pretty darn hard to let things clear caches in a meaningful way. What if one container wants an object cleared while another doesn't? -- Dave ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Containers and /proc/sys/vm/drop_caches 2011-01-06 21:50 ` Dave Hansen @ 2011-01-06 22:08 ` Matt Helsley [not found] ` <20110106220841.GK29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Matt Helsley @ 2011-01-06 22:08 UTC (permalink / raw) To: Dave Hansen Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mike Hommey, Balbir Singh On Thu, Jan 06, 2011 at 01:50:05PM -0800, Dave Hansen wrote: > On Thu, 2011-01-06 at 13:43 -0800, Matt Helsley wrote: > > That said, the more important question is why should we provide > > drop_caches inside a container? My understanding is it's largely a > > workload-debugging tool and not something meant to truly solve > > problems. If that's the case then we shouldn't provide it at all or it > > should actually interfere with the host cache. > > Yeah, what's the problem that you're solving with drop_caches? The odds > are, there's a better way. > > That said, it _might_ be worth doing things like dropping (inode or > dentry) caches per-sb. That's a much better fit than using big, ugly, > loosely-defined, system-wide knobs like drop_caches. Yup. Since many containers will have their own mount namespaces with separate sbs it's a more reasonable approximation of per-container dropping of caches. > > Also, unless we start giving containers real ownership of devices or > partitions, it's going to be pretty darn hard to let things clear caches > in a meaningful way. What if one container wants an object cleared > while another doesn't? Good point. First reaction: we'd want to keep it cached if any of the containers want it. But even that's a bad policy under certain circumstances containers (aka VPS) might be used for. Is drop_caches well-defined? IOW would it be permissible to not actually drop all or any of the cache entries or to do nothing and still report success instead of, say, EPERM, to a container? Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20110106220841.GK29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <20110106220841.GK29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org> @ 2011-01-06 22:15 ` Dave Hansen 0 siblings, 0 replies; 14+ messages in thread From: Dave Hansen @ 2011-01-06 22:15 UTC (permalink / raw) To: Matt Helsley Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mike Hommey, Balbir Singh On Thu, 2011-01-06 at 14:08 -0800, Matt Helsley wrote: > Is drop_caches well-defined? IOW would it be permissible to > not actually drop all or any of the cache entries or to do nothing and > still report success instead of, say, EPERM, to a container? It's really just a hint or a request. It's possible that an echo 3 > /proc/sys/vm/drop_caches returns '2' (for the two bytes written), indicating success and yet, not a single object was freed. There's currently no way to tell how much work it did, or to figure out why it did a certain amount of work. Frankly, in a container, it probably just shouldn't even show up in /proc. -- Dave ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <20110106214315.GJ29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org> 2011-01-06 21:50 ` Dave Hansen @ 2011-01-07 13:03 ` Rob Landley [not found] ` <4D270F34.8080305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> 1 sibling, 1 reply; 14+ messages in thread From: Rob Landley @ 2011-01-07 13:03 UTC (permalink / raw) To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On 01/06/2011 03:43 PM, Matt Helsley wrote: > On Wed, Jan 05, 2011 at 07:46:17PM +0530, Balbir Singh wrote: >> On Wed, Jan 5, 2011 at 7:31 PM, Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: >>> Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): >>>> On 01/05/2011 10:40 AM, Mike Hommey wrote: >>>>> [Copy/pasted from a previous message to lkml, where it was suggested to >>>>> try containers@] >>>>> >>>>> Hi, >>>>> >>>>> I noticed that from within a lxc container, writing "3" to >>>>> /proc/sys/vm/drop_caches would flush the host page cache. That sounds a >>>>> little dangerous for VPS offerings that would be based on lxc, as in one >>>>> VPS instance root user could impact the overall performance of the host. >>>>> I don't know about other containers but I've been told openvz isn't >>>>> subject to this problem. >>>>> I only tested the current Debian Squeeze kernel, which is based on >>>>> 2.6.32.27. >>>> >>>> There is definitively a big work to do with /proc. >>>> >>>> Some files should be not accessible (/proc/sys/vm/drop_caches, >>>> /proc/sys/kernel/sysrq, ...) and some other should be virtualized >>>> (/proc/meminfo, /proc/cpuinfo, ...). >>>> >>>> Serge suggested to create something similar to the cgroup device >>>> whitelist but for /proc, maybe it is a good approach for denying >>>> access a specific proc's file. >>> >>> Long-term, user namespaces should fix this - /proc will be owned >>> by the user namespace which mounted it, but we can tell proc to >>> always have some files (like drop_caches) be owned by init_user_ns. Changing ownership so a script can't open a file that it otherwise could may cause scripts to fail when run in a container. Makes the containers less transparent. >>> I'm hoping to push my final targeted capabilities prototype in the >>> next few weeks, and after that I start seriously attacking VFS >>> interaction. >>> >>> In the meantime, though, you can use SELinux/Smack, or a custom >>> cgroup file does sound useful. Can cgroups be modules nowadays? >>> (I can't keep up) If so, an out of tree proc-cgroup module seems >>> like a good interim solution. >>> >> >> Ideally a drop_cache should drop page cache in that container, but >> given container have a lot of shared page cache, what is suggested >> might be a good way to work around the problem > > One gross hack that comes to mind: Instead of a hard permission model > limit the frequency with which the container could actually drop caches. > Then the container's ability to interfere with host performance is more > limited (but still non-zero). Or limit frequency on a per-user basis > (more like Serge's design) because running more containers by a > compromised user account shouldn't allow more frequent cache dropping. Disk access causes at best multi-milisecond latency spikes, which can cause a heavily loaded server to go into thrashing meltdown. So a container could screw up another container with this pretty badly. The easy short-term fix is to make containers silently ignore writes to drop_caches. > That said, the more important question is why should we provide > drop_caches inside a container? My understanding is it's largely a > workload-debugging tool and not something meant to truly solve > problems. A heavily loaded system that goes deep into swap without triggering the OOM killer can become pretty useless. My home laptop with 2 gigs of ram gets so sluggish whenever I compile something that you can't use the touchpad anymore because hitting the boundary of a widget with the mouse pointer causes a 5 second freeze while it bounces a off three or four processes to handle the message, evicting yet more pages to fault in the pages to handle the X events. By the time the pointer moves again it's way overshot. (Ok, having firefox, chrome, and kmail open with several dozen tabs open in each may have something to do with this.) When it does this, ctrl-alt-f1 echo 1 > /proc/sys/vm/drop_caches is just about the only thing that will snap it out of it short of killing processes. The system has ~600 megs of ram tied up in disk cache while being so short of anonymous pages the mouse is useless. That doesn't necessarily apply to containers but that's one use case of using it as a stick to hit the darn overburdened machine when it's making stupid memory allocation decisions. (Playing with swappiness puts the OOM killer on a hair trigger, depending on kernel version du jour.) However, it's not guaranteed to do anything (the cached data could be dirty, mmaped by some process, immediately faulted back in by some other process), so ignoring writes to drop_caches from a container is probably legal behavior anyway. Rob ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <4D270F34.8080305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <4D270F34.8080305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> @ 2011-01-07 15:12 ` Serge Hallyn [not found] ` <20110107151241.GB4962-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Serge Hallyn @ 2011-01-07 15:12 UTC (permalink / raw) To: Rob Landley; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Quoting Rob Landley (rlandley-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org): > On 01/06/2011 03:43 PM, Matt Helsley wrote: > > On Wed, Jan 05, 2011 at 07:46:17PM +0530, Balbir Singh wrote: > >> On Wed, Jan 5, 2011 at 7:31 PM, Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: > >>> Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > >>>> On 01/05/2011 10:40 AM, Mike Hommey wrote: > >>>>> [Copy/pasted from a previous message to lkml, where it was suggested to > >>>>> try containers@] > >>>>> > >>>>> Hi, > >>>>> > >>>>> I noticed that from within a lxc container, writing "3" to > >>>>> /proc/sys/vm/drop_caches would flush the host page cache. That sounds a > >>>>> little dangerous for VPS offerings that would be based on lxc, as in one > >>>>> VPS instance root user could impact the overall performance of the host. > >>>>> I don't know about other containers but I've been told openvz isn't > >>>>> subject to this problem. > >>>>> I only tested the current Debian Squeeze kernel, which is based on > >>>>> 2.6.32.27. > >>>> > >>>> There is definitively a big work to do with /proc. > >>>> > >>>> Some files should be not accessible (/proc/sys/vm/drop_caches, > >>>> /proc/sys/kernel/sysrq, ...) and some other should be virtualized > >>>> (/proc/meminfo, /proc/cpuinfo, ...). > >>>> > >>>> Serge suggested to create something similar to the cgroup device > >>>> whitelist but for /proc, maybe it is a good approach for denying > >>>> access a specific proc's file. > >>> > >>> Long-term, user namespaces should fix this - /proc will be owned > >>> by the user namespace which mounted it, but we can tell proc to > >>> always have some files (like drop_caches) be owned by init_user_ns. > > Changing ownership so a script can't open a file that it otherwise > could may cause scripts to fail when run in a container. Makes the > containers less transparent. While my goal next week is to make containers more transparent, the official stance from kernel summit a few years ago was: transparent containers are not a valid goal (as seen from kernel). Not saying that what you're saying above is wrong, but I *do* argue that 'silently ignoring the write' is more wrong than refusing the write :) Fooling userspace is a lose, imo. Also, we can use a FUSE fs over proc to hide the files. Doing that now is insufficient because root in the container can just remount proc over the filter. But after user namespaces, root in the container has the choice of leaving the filter in place for the sake of his own usespace, or removing it and getting a bunch of files he can't use. ... > A heavily loaded system that goes deep into swap without triggering > the OOM killer can become pretty useless. My home laptop with 2 gigs Isn't a cgroup that controls both memory and swap access the right answer to this? (And do we have that now, btw?) (I'm doing too many things at once so probably not thinking this through enough) -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20110107151241.GB4962-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <20110107151241.GB4962-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> @ 2011-01-08 12:39 ` Rob Landley [not found] ` <4D285B03.6050708-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Rob Landley @ 2011-01-08 12:39 UTC (permalink / raw) To: Serge Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On 01/07/2011 09:12 AM, Serge Hallyn wrote: >> Changing ownership so a script can't open a file that it otherwise >> could may cause scripts to fail when run in a container. Makes >> the containers less transparent. > > While my goal next week is to make containers more transparent, the > official stance from kernel summit a few years ago was: transparent > containers are not a valid goal (as seen from kernel). Do you have a reference for that? I'm still coming up to speed on all this. Trying to collect documentation... >> A heavily loaded system that goes deep into swap without triggering >> the OOM killer can become pretty useless. My home laptop with 2 >> gigs > > Isn't a cgroup that controls both memory and swap access the right > answer to this? There are other ways to work around it, sure. (It's yet to be proven that they do actually work better in resource constrained desktop environments under real-world load, but they seem very promising.) I was just pointing out that this has seen some use as a recovery mechanism, slightly less drastic than the OOM killer. (Didn't say it was a _good_ use. Also, error avoidance and error recovery are different issues, and virtual memory is an inherently overcommitted resource domain.) > (And do we have that now, btw?) I think it's coming, rather than actually here. (I thought the beancounters stuff was OpenVZ, controlled by syscalls that the kernel developers rejected. Have resource constraints on anything other than scheduler made it into vanilla yet? If so, what's the UI to control them?) By the way, from a UI perspective, most of the containers stuff I've seen so far is apparently aimed at big iron deployments (or attempts to make PC clusters look like mainframes, I.E. this "cloud" stuff). I'm glad to see more diverse uses of it, but one of the downsides of cobbling together a mechanism from a dozen different unrelated pieces of infrastructure (clone flags, cgroup filesystem, extra mount flags on proc and such so they behave differently) is that we need a lot of documentation/example code/libraries to make it easy to use. "You can do X" and "it's easy to reliably do X" have a gap that may take a while to close... Rob ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <4D285B03.6050708-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]
* Re: Containers and /proc/sys/vm/drop_caches [not found] ` <4D285B03.6050708-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> @ 2011-01-11 16:28 ` Serge Hallyn 0 siblings, 0 replies; 14+ messages in thread From: Serge Hallyn @ 2011-01-11 16:28 UTC (permalink / raw) To: Rob Landley; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Quoting Rob Landley (rlandley-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org): > On 01/07/2011 09:12 AM, Serge Hallyn wrote: > >> Changing ownership so a script can't open a file that it otherwise > >> could may cause scripts to fail when run in a container. Makes > >> the containers less transparent. > > > > While my goal next week is to make containers more transparent, the > > official stance from kernel summit a few years ago was: transparent > > containers are not a valid goal (as seen from kernel). > > Do you have a reference for that? I'm still coming up to speed on all this. Trying to collect documentation... Sorry, I don't offhand, and a quick google search wasn't helpful. I think it was from the very first containers discussion at ksummit, but not sure. There is http://lwn.net/Articles/191923/. Toward the bottom it claims that noone thought it would be a problem to tweak distros to run in containers without /sys and /proc. But this was 2006, when pid namespaces were still a new idea, and noone was actually using containers. It certainly is possible that sentiment has changed, which is why I do feel that it's worth it for someone to try some native containerization inside fs/proc/*.c. While user namespaces should make it possible to make fuse proc filtering less wishy-washy, they won't make it any less ugly :) -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
* Containers and /proc/sys/vm/drop_caches @ 2010-12-30 7:59 Mike Hommey 2010-12-30 8:57 ` Rob Landley 0 siblings, 1 reply; 14+ messages in thread From: Mike Hommey @ 2010-12-30 7:59 UTC (permalink / raw) To: linux-kernel Hi, I noticed that from within a lxc container, writing "3" to /proc/sys/vm/drop_caches would flush the host page cache. That sounds a little dangerous for VPS offerings that would be based on lxc, as in one VPS instance root user could impact the overall performance of the host. I don't know about other containers but I've been told openvz isn't subject to this problem. I only tested the current Debian Squeeze kernel, which is based on 2.6.32.27. Cheers, Mike ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Containers and /proc/sys/vm/drop_caches 2010-12-30 7:59 Mike Hommey @ 2010-12-30 8:57 ` Rob Landley 0 siblings, 0 replies; 14+ messages in thread From: Rob Landley @ 2010-12-30 8:57 UTC (permalink / raw) To: Mike Hommey; +Cc: linux-kernel On Thu, Dec 30, 2010 at 1:59 AM, Mike Hommey <mh@glandium.org> wrote: > Hi, > > I noticed that from within a lxc container, writing "3" to > /proc/sys/vm/drop_caches would flush the host page cache. That sounds a > little dangerous for VPS offerings that would be based on lxc, as in one > VPS instance root user could impact the overall performance of the host. There's a containers@vger mailing list for this stuff, you might have better luck asking there. > I don't know about other containers but I've been told openvz isn't > subject to this problem. I've been coming up to speed on this area recently: openvz has a lot of stuff that isn't in the main kernel, but it's based on an approach that didn't get merged into the kernel (using new syscalls to control container stuff). Instead Google's rewrite of sgi's cgroup stuff went in for process grouping (based on the cgroup filesystem), and a half-dozen different types of namespaces are based on flags to clone(), and various other filesystems (proc, sys, devpts) grew some kind of -o newinstance flag (see http://lkml.indiana.edu/hypermail//linux/kernel/1012.3/00777.html for a pending example, although why they can't detect they're the first instance in the current container rather than containers having to be specially set up by the host, I still don't understand yet)... and so on. The rest of the stuff openvz does is still being redesigned to go into vanilla based on those mechanisms. It seems a bit like squashfs: vanilla should be able to do this someday, but when it gets merged it may not be compatible with the out of tree version. LXC is an attempt to make a userspace tool to drive containers in the vanilla kernel. It doesn't do half of what openvz does yet, but they're working on it. Rob ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-01-11 16:28 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-05 9:40 Containers and /proc/sys/vm/drop_caches Mike Hommey
[not found] ` <20110105094022.GA5366-YmoObPS1fuhg9hUCZPvPmw@public.gmane.org>
2011-01-05 9:49 ` Daniel Lezcano
[not found] ` <4D243EC3.1050101-GANU6spQydw@public.gmane.org>
2011-01-05 14:01 ` Serge Hallyn
[not found] ` <20110105140159.GC2718-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-01-05 14:16 ` Balbir Singh
[not found] ` <AANLkTi=x=6gUZTxJC8LXxYNu029+firyzKqjMa6m+R-x-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-01-06 21:43 ` Matt Helsley
[not found] ` <20110106214315.GJ29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2011-01-06 21:50 ` Dave Hansen
2011-01-06 22:08 ` Matt Helsley
[not found] ` <20110106220841.GK29064-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2011-01-06 22:15 ` Dave Hansen
2011-01-07 13:03 ` Rob Landley
[not found] ` <4D270F34.8080305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-01-07 15:12 ` Serge Hallyn
[not found] ` <20110107151241.GB4962-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-01-08 12:39 ` Rob Landley
[not found] ` <4D285B03.6050708-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-01-11 16:28 ` Serge Hallyn
-- strict thread matches above, loose matches on Subject: below --
2010-12-30 7:59 Mike Hommey
2010-12-30 8:57 ` Rob Landley
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.