From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Dobriyan Subject: Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do? Date: Sun, 1 Mar 2009 23:56:59 +0300 Message-ID: <20090301205659.GA7276@x200.localdomain> References: <1234467035.3243.538.camel@calx> <20090212114207.e1c2de82.akpm@linux-foundation.org> <1234475483.30155.194.camel@nimitz> <20090212141014.2cd3d54d.akpm@linux-foundation.org> <1234479845.30155.220.camel@nimitz> <20090226162755.GB1456@x200.localdomain> <20090226173302.GB29439@elte.hu> <20090226223112.GA2939@x200.localdomain> <20090301013304.GA2428@x200.localdomain> <20090301200231.GA25276@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20090301200231.GA25276-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: "Serge E. Hallyn" Cc: Ingo Molnar , linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Hansen , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org, Andrew Morton , torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org, xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org List-Id: linux-api@vger.kernel.org On Sun, Mar 01, 2009 at 02:02:31PM -0600, Serge E. Hallyn wrote: > Quoting Alexey Dobriyan (adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org): > > On Fri, Feb 27, 2009 at 01:31:12AM +0300, Alexey Dobriyan wrote: > > > This is collecting and start of dumping part of cleaned up OpenVZ C/R > > > implementation, FYI. > > > > OK, here is second version which shows what to do with shared objects > > (cr_dump_nsproxy(), cr_dump_task_struct()), introduced more checks > > (still no unlinked files) and dumps some more information including > > structures connections (cr_pos_*) > > > > Dumping pids in under thinking because in OpenVZ pids are saved as > > numbers due to CLONE_NEWPID is not allowed in container. In presense > > of multiple CLONE_NEWPID levels this must present a big problem. Looks > > like there is now way to not dump pids as separate object. > > > > As result, struct cr_image_pid is variable-sized, don't know how this will > > play later. > > > > Also, pid refcount check for external pointers is busted right now, > > because /proc inode pins struct pid, so there is almost always refcount > > vs ->o_count mismatch. > > > > No restore yet. ;-) > > Hi Alexey, > > thanks for posting this. Of course there are some predictable responses > (I like the simplicity of pure in-kernel, Dave will not :) but this > needs to be posted to make us talk about it. > > A few more comments that came to me while looking it over: > > 1. cap_sys_admin check is unfortunate. In discussions about Oren's > patchset we've agreed that not having that check from the outset forces > us to consider security with each new patch and feature, which is a good > thing. Removing CAP_SYS_ADMIN on restore? > 2. if any tasks being checkpointed are frozen, checkpoint has the > side effect of thawing them, right? Haven't tried, but should be a bug, yes. It will be "thaw or kill" depending on "flags". > 3. wrt pids, i guess what you really want is to store the pids from > init_tsk's level down to the task's lowest pid, right? Then you > manually set each of those on restart? Any higher pids of course > don't matter. Yes, numbers are really meant to be from init_tsk level. > 4. do you have any thoughts on what to do with the mntns info at > restart? Will you try to detect mounts which need to be re-created? > How? Haven't thought, but it will be tricky for sure :^) > 5. Since you're always setting f_pos, this won't work straight over > a pipe? Do you figure that's just not a worthwhile feature? So far there were no loops when dumping data structures, but I _think_ there will be some, so seeking over dumpfile would be inevitable. > Were you saying (in response to Dave) that you're having private > discussions about whether to pursue posting this as an alternative > to Oren's patchset? If so, any updates on those discussions? Right now, no. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html