linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jann Horn <jannh@google.com>,
	Serge Hallyn <serge.hallyn@ubuntu.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Prakash Sangappa <prakash.sangappa@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH RFC v5] pidns: introduce syscall translate_pid
Date: Tue, 15 May 2018 10:44:59 -0700	[thread overview]
Message-ID: <32016c04-8835-bf8f-74c7-254e28d815aa@oracle.com> (raw)
In-Reply-To: <4b21a648-0abc-e92d-ec37-681dff63bb55@oracle.com>



On 05/15/2018 10:40 AM, Nagarathnam Muthusamy wrote:
>
>
> On 05/15/2018 10:36 AM, Konstantin Khlebnikov wrote:
>>
>>
>> On 15.05.2018 20:19, Nagarathnam Muthusamy wrote:
>>>
>>>
>>> On 04/24/2018 10:36 PM, Konstantin Khlebnikov wrote:
>>>> On 23.04.2018 20:37, Nagarathnam Muthusamy wrote:
>>>>>
>>>>>
>>>>> On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote:
>>>>>> On 05.04.2018 01:29, Eric W. Biederman wrote:
>>>>>>> Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> writes:
>>>>>>>
>>>>>>>> On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote:
>>>>>>>>> Each process have different pids, one for each pid namespace 
>>>>>>>>> it belongs.
>>>>>>>>> When interaction happens within single pid-ns translation 
>>>>>>>>> isn't required.
>>>>>>>>> More complicated scenarios needs special handling.
>>>>>>>>>
>>>>>>>>> For example:
>>>>>>>>> - reading pid-files or logs written inside container with pid 
>>>>>>>>> namespace
>>>>>>>>> - attaching with ptrace to tasks from different pid namespace
>>>>>>>>> - passing pids across pid namespaces in any kind of API
>>>>>>>>>
>>>>>>>>> Currently there are several interfaces that could be used here:
>>>>>>>>>
>>>>>>>>> Pid namespaces are identified by inode number of 
>>>>>>>>> /proc/[pid]/ns/pid.
>>>>>>>
>>>>>>> Using the inode number in interfaces is not an option. 
>>>>>>> Especially not
>>>>>>> withou referencing the device number for the filesystem as well.
>>>>>>
>>>>>> This is supposed to be single-instance fs,
>>>>>> not part of proc but referenced but its magic "symlinks".
>>>>>>
>>>>>> Device numbers are not mentioned in "man namespaces".
>>>>>>
>>>>>>>
>>>>>>>>> Pids for nested Pid namespaces are shown in file 
>>>>>>>>> /proc/[pid]/status.
>>>>>>>>> In some cases conversion pid -> vpid could be easily done 
>>>>>>>>> using this
>>>>>>>>> information, but backward translation requires scanning all 
>>>>>>>>> tasks.
>>>>>>>>>
>>>>>>>>> Unix socket automatically translates pid attached to 
>>>>>>>>> SCM_CREDENTIALS.
>>>>>>>>> This requires CAP_SYS_ADMIN for sending arbitrary pids and 
>>>>>>>>> entering
>>>>>>>>> into pid namespace, this expose process and could be insecure.
>>>>>>>>>
>>>>>>>>> This patch adds new syscall for converting pids between pid 
>>>>>>>>> namespaces:
>>>>>>>>>
>>>>>>>>> pid_t translate_pid(pid_t pid, int source_type, int source,
>>>>>>>>>                                  int target_type, int target);
>>>>>>>>>
>>>>>>>>> @source_type and @target_type defines type of following 
>>>>>>>>> arguments:
>>>>>>>>>
>>>>>>>>> TRANSLATE_PID_CURRENT_PIDNS  - current pid namespace, argument 
>>>>>>>>> is unused
>>>>>>>>> TRANSLATE_PID_TASK_PIDNS     - task pid-ns, argument is task pid
>>>>>>>>
>>>>>>>> I believe using pid to represent the namespace has been already
>>>>>>>> discussed in V1 of this patch in 
>>>>>>>> https://lkml.org/lkml/2015/9/22/1087
>>>>>>>> after which we moved on to fd based version of this interface.
>>>>>>>
>>>>>>> Or in short why is the case of pids important?
>>>>>>>
>>>>>>> You Konstantin you almost said why they were important in your 
>>>>>>> message
>>>>>>> saying you were going to send this one.  However you don't 
>>>>>>> explain in
>>>>>>> your description why you want to identify pid namespaces by pid.
>>>>>>>
>>>>>>
>>>>>> Open of /proc/[pid]/ns/pid requires same permissions as ptrace,
>>>>>> pid based variant doesn't have such restrictions.
>>>>>
>>>>> Can you provide more information on usecase requiring PID 
>>>>> translation but not used for tracing related purposes?
>>>>
>>>> Any introspection for [nested] containers. It's easier to work when 
>>>> you have all information when you don't have any.
>>>> For example our CMS https://github.com/yandex/porto allows to start 
>>>> nested sub-container (or even deeper) by request from any container 
>>>> and have to tell back which pid task is have. And it could 
>>>> translate any pid inside into accessible by client and vice versa.
>>>>
>>>
>>> I still dont get the exact reason why PID based approach to identify 
>>> the namespace during pid translation process is absolutely required 
>>> compared to fd based approach. 
>>
>> As I told open(/proc/%d/ns/pid) have security restrictions - same 
>> uid/CAP_SYS_PTRACE/whatever
>> Pidns-fd holds pid-namespace and without restrictions could be abused.
>> Pid based API is racy but always available without any restrictions.
>
> I get that Pid based API is available without any restrictions but do 
> we have any existing usecase which requires Pid based API but cannot 
> use Pidns-fd based API? Most of the usecases discussed in this thread 
> deals with introspection of a process by another process and I believe 
> that security requirement for opening (/proc/%d/ns/pid) is required 
> for all such usecases. In other words, Why would a process which does 
> not belong to same uid 

Typo: inspection of a process by another process

Thanks,
Nagarathnam.

> of the process observed or have CAP_SYS_PTRACE be allowed to translate 
> PID?
>
> Thanks,
> Nagarathnam.
>>
>>
>>> From your version of TranslatePid in
>>>
>>> https://github.com/yandex/porto/blob/0d7e6e7e1830dcd0038a057b2ab9964cec5b8fab/src/util/unix.cpp 
>>>
>>>
>>> I see that you are going through the trouble of forking a process 
>>> and sending SMC_CREDENTIALS for pid translation. Even your existing 
>>> API could be extremely simplified if translate_pid based on file 
>>> descriptors make it to the gate and I believe from the last 
>>> discussion it was almost there 
>>> https://patchwork.kernel.org/patch/10305439/
>>>
>>>
>>>>> On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS 
>>>>> and TRANSLATE_PID_FD_PIDNS integrated first and then possibly 
>>>>> extend the interface to include TRANSLATE_PID_TASK_PIDNS in future?
>>>>
>>>> I don't see reason for this separation.
>>>> Pids and pid namespaces are part of the API for a long time.
>>>
>>> If you are talking about the translate_pid API proposed, I believe 
>>> the V4 proposed under https://patchwork.kernel.org/patch/10003935/ 
>>> had only fd based API before a mix of PID and fd based is proposed 
>>> in V5. Again, I was just wondering if we can get the FD based 
>>> approach in first and then extend the API to include PID based 
>>> approach later as fd based approach could provide a lot of immediate 
>>> benefits?
>>>
>>> Thanks,
>>> Nagarathnam.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Nagarathnam.
>>>>>> Most pid-based syscalls are racy in some cases but they are
>>>>>> here for decades and everybody knowns how to deal with it.
>>>>>> So, I've decided to merge both worlds in one interface which 
>>>>>> clearly tells what to expect.
>>>>>
>>>
>

  reply	other threads:[~2018-05-15 17:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-04 19:11 [PATCH RFC v5] pidns: introduce syscall translate_pid Konstantin Khlebnikov
2018-04-04 20:31 ` Nagarathnam Muthusamy
2018-04-04 22:29   ` Eric W. Biederman
2018-04-05  7:02     ` Konstantin Khlebnikov
2018-04-23 17:37       ` Nagarathnam Muthusamy
2018-04-25  5:36         ` Konstantin Khlebnikov
2018-05-15 17:19           ` Nagarathnam Muthusamy
2018-05-15 17:36             ` Konstantin Khlebnikov
2018-05-15 17:40               ` Nagarathnam Muthusamy
2018-05-15 17:44                 ` Nagarathnam Muthusamy [this message]
2018-05-31 17:41               ` Nagarathnam Muthusamy
2018-05-31 17:52                 ` Eric W. Biederman
2018-05-31 18:05                 ` Eric W. Biederman
2018-06-01  6:58                   ` Konstantin Khlebnikov
2018-06-01 15:55                     ` NAGARATHNAM MUTHUSAMY
2018-06-01 16:14                       ` Eric W. Biederman
2018-06-01 16:15                       ` Konstantin Khlebnikov
2018-04-27 12:15       ` Michael Kerrisk (man-pages)
2018-05-31 18:02         ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32016c04-8835-bf8f-74c7-254e28d815aa@oracle.com \
    --to=nagarathnam.muthusamy@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=prakash.sangappa@oracle.com \
    --cc=serge.hallyn@ubuntu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).