From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: mtk.manpages@gmail.com, Andrew Vagin <avagin@virtuozzo.com>,
Andrey Vagin <avagin@openvz.org>,
"Serge E. Hallyn" <serge@hallyn.com>,
"criu@openvz.org" <criu@openvz.org>,
Linux API <linux-api@vger.kernel.org>,
Linux Containers <containers@lists.linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
James Bottomley <James.Bottomley@hansenpartnership.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
Date: Sun, 31 Jul 2016 23:31:44 +0200 [thread overview]
Message-ID: <e2527d9a-58e4-9784-7e75-f08e6aa61930@gmail.com> (raw)
In-Reply-To: <87h9b8e2v7.fsf@x220.int.ebiederm.org>
Hi Eric,
On 07/29/2016 08:05 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Eric,
>>
>> On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>>
>>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>>>
>>>>> If we want to compare two file descriptors of the current process,
>>>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>>>> compare two namespaces which are opened in other processes.
>>>>
>>>> Is there really a use case there? I assume we're talking about the
>>>> scenario where a process in one namespace opens a /proc/PID/ns/*
>>>> file descriptor and passes that FD to another process via a UNIX
>>>> domain socket. Is that correct?
>>>>
>>>> So, supposing that we want to build a map of the relationships
>>>> between namespaces using the proposed kcmp() API, and there are
>>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>>>> to kcmp()?
>>>
>>> Potentially. The numbers are small enough O(N^2) isn't fatal.
>>
>> Define "small", please.
>>
>> O(N^2) makes me nervous about what other use cases lurk out
>> there that may get bitten by this.
>
> Worst case for N (One namespace per thread) is about 60k.
I'm getting an education here: where does the 60k number come from?
> A typical heavy use case may be 1000 namespaces of any type.
> So we are talking about O(N^2) that rarely happens and should be done in
> a couple of seconds.
I don't know whether that's acceptable for the migration use case,
but seems quite bad for the visualization use case.
>>> Where kcmp shines is that it allows migration to happen. Inode numbers
>>> to change (which they very much will today), and still have things work.
>>
>>
>>> We can keep it O(Nlog(N)) by taking advantage of not just the equality
>>> but the ordering relationship. Although Ugh.
>>
>> Yes, that sounds pretty ugly...
>
> Actually having thought about this a little more if kcmp returns an
> ordering by inode and migration preserves the relative order of
> the inodes (which should just be a creation order) it should be quite
> solvable.
>
> Switch from an order by inode number to an order by object creation
> time, and guarantee that all creations are have an order (which with
> task_list_lock we practically already have) and it should be even easier
> to create. (A 64bit nanosecond resolution timestamp is good for 544
> years of uptime). A 64bit number that increments each time an object is
> created should have an even better lifespan.
>
> I don't know if we can find a way to give that guarantee for other kcmp
> comparisons but it is worth a thought.
Okay. So, this is a pathway to O(Nlog(N)) at least then?
>>> One disadvantage of
>>> kcmp currently is that the way the ordering relationship is defined
>>> the order is not preserved over migration :(
>>
>> So, does kcmp() fully solve the proble(s) at hand? It sounds like
>> not, if I understand your last point correctly.
>
> There are 3 possibilities I see for migration in migration, ordered
> in order of implementation difficulty.
> 1) Have a clear signal that migration happened and a nested migration
> needs to restart.
> 2) Use kcmp so that only the relative order needs to be preserved.
> 3) Preserve the device number and inode numbers.
>
> At a practical level I think (2) may actually in net be the simplest.
> It requires a little more care to implement and you have to opt in,
> but it should not require any rolling back of activity (merely careful
> ordering of object creation).
>
> I definititely like kcmp knowing how to compare things by inode
> (aka st_dev, st_inode) because then even if you have to restart
> the comparisons after a migration the exact details you are comparing
> are hidden and so it is easier to support and harder to get wrong.
>
> I can imagine how to preserve inode numbers by creating a new instance
> of nsfs instance and using the old inode numbers upon restore. I don't
> currently see how we could possibly preserve st_dev over migration short of
> a device number namespace.
>
> So if we are going to continue with making device numbers be a legacy
> attribute applications should not care about we need a way to compare
> things by not looking at st_dev. Which brings us back to kcmp.
>
> Hmm. Hotplugging as disk and plugging it back likely will change the
> device number and give the same kind of challenge with st_dev (although
> you can't keep a file descriptor open across that kind of event). So
> certainly a hotplug event on a device should be enough to say don't care
> about the device number.
Okay.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
next prev parent reply other threads:[~2016-07-31 21:31 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-14 18:20 [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
2016-07-14 18:20 ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
2016-07-15 12:21 ` kbuild test robot
2016-07-14 18:20 ` [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace Andrey Vagin
2016-07-14 19:07 ` W. Trevor King
2016-07-14 18:20 ` [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor Andrey Vagin
2016-07-14 18:48 ` W. Trevor King
2016-07-14 18:20 ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
2016-07-14 18:20 ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
2016-07-14 22:02 ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
2016-07-15 2:12 ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
2016-07-15 2:12 ` [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace Andrey Vagin
2016-07-24 5:03 ` Eric W. Biederman
2016-07-24 6:37 ` Andrew Vagin
2016-07-24 14:30 ` Eric W. Biederman
2016-07-24 17:05 ` W. Trevor King
2016-07-24 16:54 ` W. Trevor King
2016-07-15 2:12 ` [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor Andrey Vagin
2016-07-15 2:12 ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
2016-07-24 5:07 ` Eric W. Biederman
2016-07-15 2:12 ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
2016-07-16 8:21 ` [PATCH 1/5] namespaces: move user_ns into ns_common kbuild test robot
2016-07-23 23:07 ` kbuild test robot
2016-07-24 5:00 ` Eric W. Biederman
[not found] ` <87k2gbmy02.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-24 5:54 ` Andrew Vagin
2016-07-24 5:54 ` Andrew Vagin
2016-07-24 5:54 ` Andrew Vagin
2016-07-24 5:54 ` Andrew Vagin
2016-07-24 5:10 ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Eric W. Biederman
2016-07-26 2:07 ` Andrew Vagin
2016-07-21 14:41 ` Michael Kerrisk (man-pages)
2016-07-21 21:06 ` Andrew Vagin
[not found] ` <20160721210650.GA10989-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-07-22 6:48 ` Michael Kerrisk (man-pages)
2016-07-22 18:25 ` Andrey Vagin
2016-07-25 11:47 ` Michael Kerrisk (man-pages)
2016-07-25 13:18 ` Eric W. Biederman
2016-07-25 14:46 ` Michael Kerrisk (man-pages)
2016-07-25 14:54 ` Serge E. Hallyn
2016-07-25 15:17 ` Eric W. Biederman
2016-07-25 14:59 ` Eric W. Biederman
2016-07-26 2:54 ` Andrew Vagin
2016-07-26 8:03 ` Michael Kerrisk (man-pages)
2016-07-26 18:25 ` Andrew Vagin
2016-07-26 18:32 ` W. Trevor King
2016-07-26 19:11 ` Andrew Vagin
2016-07-26 19:17 ` Michael Kerrisk (man-pages)
2016-07-26 20:39 ` Andrew Vagin
2016-07-28 10:45 ` Michael Kerrisk (man-pages)
2016-07-28 12:56 ` Eric W. Biederman
2016-07-28 19:00 ` Michael Kerrisk (man-pages)
2016-07-29 18:05 ` Eric W. Biederman
2016-07-31 21:31 ` Michael Kerrisk (man-pages) [this message]
2016-08-01 23:01 ` Andrew Vagin
2016-07-26 19:38 ` Eric W. Biederman
2016-07-23 21:14 ` W. Trevor King
2016-07-23 21:38 ` James Bottomley
2016-07-23 21:58 ` W. Trevor King
2016-07-23 21:56 ` Eric W. Biederman
2016-07-23 22:34 ` W. Trevor King
2016-07-24 4:51 ` Eric W. Biederman
2016-08-01 18:20 ` Alban Crequy
2016-08-01 23:32 ` Andrew Vagin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2527d9a-58e4-9784-7e75-f08e6aa61930@gmail.com \
--to=mtk.manpages@gmail.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=avagin@openvz.org \
--cc=avagin@virtuozzo.com \
--cc=containers@lists.linux-foundation.org \
--cc=criu@openvz.org \
--cc=ebiederm@xmission.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=serge@hallyn.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).