From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael Kerrisk (man-pages)" Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Date: Sun, 31 Jul 2016 23:31:44 +0200 Message-ID: References: <1515f5f2-5a49-fcab-61f4-8b627d3ba3e2@gmail.com> <87lh0pg8jx.fsf@x220.int.ebiederm.org> <44ca0e41-dc92-45b1-2a6c-c41a048a072d@gmail.com> <87r3ahepb4.fsf@x220.int.ebiederm.org> <20160726025455.GC26206@outlook.office365.com> <3390535b-0660-757f-aeba-c03d936b3485@gmail.com> <20160726182524.GA328@outlook.office365.com> <20160726203955.GA9415@outlook.office365.com> <87popxkjjp.fsf@x220.int.ebiederm.org> <40e35f1a-10e6-b7a5-936e-a09f008be0d0@gmail.com> <87h9b8e2v7.fsf@x220.int.ebiederm.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87h9b8e2v7.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: "Eric W. Biederman" Cc: James Bottomley , Andrey Vagin , Andrew Vagin , Linux API , Linux Containers , LKML , Alexander Viro , "criu-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org" , mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, linux-fsdevel List-Id: linux-api@vger.kernel.org Hi Eric, On 07/29/2016 08:05 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > >> Hi Eric, >> >> On 07/28/2016 02:56 PM, Eric W. Biederman wrote: >>> "Michael Kerrisk (man-pages)" writes: >>> >>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote: >>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote: >>> >>>>> If we want to compare two file descriptors of the current process, >>>>> it is one of cases for which kcmp can be used. We can call kcmp to >>>>> compare two namespaces which are opened in other processes. >>>> >>>> Is there really a use case there? I assume we're talking about the >>>> scenario where a process in one namespace opens a /proc/PID/ns/* >>>> file descriptor and passes that FD to another process via a UNIX >>>> domain socket. Is that correct? >>>> >>>> So, supposing that we want to build a map of the relationships >>>> between namespaces using the proposed kcmp() API, and there are >>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls >>>> to kcmp()? >>> >>> Potentially. The numbers are small enough O(N^2) isn't fatal. >> >> Define "small", please. >> >> O(N^2) makes me nervous about what other use cases lurk out >> there that may get bitten by this. > > Worst case for N (One namespace per thread) is about 60k. I'm getting an education here: where does the 60k number come from? > A typical heavy use case may be 1000 namespaces of any type. > So we are talking about O(N^2) that rarely happens and should be done in > a couple of seconds. I don't know whether that's acceptable for the migration use case, but seems quite bad for the visualization use case. >>> Where kcmp shines is that it allows migration to happen. Inode numbers >>> to change (which they very much will today), and still have things work. >> >> >>> We can keep it O(Nlog(N)) by taking advantage of not just the equality >>> but the ordering relationship. Although Ugh. >> >> Yes, that sounds pretty ugly... > > Actually having thought about this a little more if kcmp returns an > ordering by inode and migration preserves the relative order of > the inodes (which should just be a creation order) it should be quite > solvable. > > Switch from an order by inode number to an order by object creation > time, and guarantee that all creations are have an order (which with > task_list_lock we practically already have) and it should be even easier > to create. (A 64bit nanosecond resolution timestamp is good for 544 > years of uptime). A 64bit number that increments each time an object is > created should have an even better lifespan. > > I don't know if we can find a way to give that guarantee for other kcmp > comparisons but it is worth a thought. Okay. So, this is a pathway to O(Nlog(N)) at least then? >>> One disadvantage of >>> kcmp currently is that the way the ordering relationship is defined >>> the order is not preserved over migration :( >> >> So, does kcmp() fully solve the proble(s) at hand? It sounds like >> not, if I understand your last point correctly. > > There are 3 possibilities I see for migration in migration, ordered > in order of implementation difficulty. > 1) Have a clear signal that migration happened and a nested migration > needs to restart. > 2) Use kcmp so that only the relative order needs to be preserved. > 3) Preserve the device number and inode numbers. > > At a practical level I think (2) may actually in net be the simplest. > It requires a little more care to implement and you have to opt in, > but it should not require any rolling back of activity (merely careful > ordering of object creation). > > I definititely like kcmp knowing how to compare things by inode > (aka st_dev, st_inode) because then even if you have to restart > the comparisons after a migration the exact details you are comparing > are hidden and so it is easier to support and harder to get wrong. > > I can imagine how to preserve inode numbers by creating a new instance > of nsfs instance and using the old inode numbers upon restore. I don't > currently see how we could possibly preserve st_dev over migration short of > a device number namespace. > > So if we are going to continue with making device numbers be a legacy > attribute applications should not care about we need a way to compare > things by not looking at st_dev. Which brings us back to kcmp. > > Hmm. Hotplugging as disk and plugging it back likely will change the > device number and give the same kind of challenge with st_dev (although > you can't keep a file descriptor open across that kind of event). So > certainly a hotplug event on a device should be enough to say don't care > about the device number. Okay. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/