* [PATCH] [RFC] Make core_pattern support namespace
@ 2016-02-04 12:16 Zhao Lei
2016-02-16 14:26 ` [PATCH] " Mateusz Guzik
0 siblings, 1 reply; 8+ messages in thread
From: Zhao Lei @ 2016-02-04 12:16 UTC (permalink / raw)
To: linux-kernel; +Cc: Zhao Lei
Currently, each container shared one copy of coredump setting
with the host system, if host system changed the setting, each
running containers will be affected.
Moreover, it is not easy to let each container keeping their own
coredump setting.
We can use some workaround as pipe program to make the second
requirement possible, but it is not simple, and both host and
container are limited to set to fixed pipe program.
In one word, for host running contailer, we can't change core_pattern
anymore.
To make the problem more hard, if a host running more than one
container product, each product will try to snatch the global
coredump setting to fit their own requirement.
For container based on namespace design, it is good to allow
each container keeping their own coredump setting.
It will bring us following benefit:
1: Each container can change their own coredump setting
based on operation on /proc/sys/kernel/core_pattern
2: Coredump setting changed in host will not affect
running containers.
3: Support both case of "putting coredump in guest" and
"putting curedump in host".
Each namespace-based software(lxc, docker, ..) can use this function
to custom their dump setting.
And this function makes each continer working as separate system,
it fit for design goal of namespace.
Test(in lxc):
# In the host
# ----------------
# echo host_core >/proc/sys/kernel/core_pattern
# cat /proc/sys/kernel/core_pattern
host_core
# ulimit -c 1024000
# ./make_dump
Segmentation fault (core dumped)
# ls -l
-rw------- 1 root root 331776 Feb 4 18:02 host_core.2175
-rwxr-xr-x 1 root root 759731 Feb 4 18:01 make_dump
#
# In the container
# ----------------
# cat /proc/sys/kernel/core_pattern
host_core
# echo container_core >/proc/sys/kernel/core_pattern
# ./make_dump
Segmentation fault (core dumped)
# ls -l
-rwxr-xr-x 1 root root 759731 Feb 4 10:45 make_dump
-rw------- 1 root root 331776 Feb 4 10:45 container_core.16
#
# Return to host
# ----------------
# cat /proc/sys/kernel/core_pattern
host_core
# ls
host_core.2175 make_dump make_dump.c
# rm -f host_core.2175
# ./make_dump
Segmentation fault (core dumped)
# ls -l
-rw------- 1 root root 331776 Feb 4 18:49 host_core.2351
-rwxr-xr-x 1 root root 759731 Feb 4 18:01 make_dump
#
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
---
fs/coredump.c | 3 +--
include/linux/pid_namespace.h | 2 ++
kernel/pid.c | 1 +
kernel/pid_namespace.c | 3 +++
kernel/sysctl.c | 22 ++++++++++++++++------
5 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index 9ea87e9..8a7ef9b 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -46,7 +46,6 @@
int core_uses_pid;
unsigned int core_pipe_limit;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
static int core_name_size = CORENAME_MAX_SIZE;
struct core_name {
@@ -183,7 +182,7 @@ put_exe_file:
static int format_corename(struct core_name *cn, struct coredump_params *cprm)
{
const struct cred *cred = current_cred();
- const char *pat_ptr = core_pattern;
+ const char *pat_ptr = current->nsproxy->pid_ns_for_children->core_pattern;
int ispipe = (*pat_ptr == '|');
int pid_in_pattern = 0;
int err = 0;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 918b117..a5af1e9 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -9,6 +9,7 @@
#include <linux/nsproxy.h>
#include <linux/kref.h>
#include <linux/ns_common.h>
+#include <linux/binfmts.h>
struct pidmap {
atomic_t nr_free;
@@ -45,6 +46,7 @@ struct pid_namespace {
int hide_pid;
int reboot; /* group exit code if this pidns was rebooted */
struct ns_common ns;
+ char core_pattern[CORENAME_MAX_SIZE];
};
extern struct pid_namespace init_pid_ns;
diff --git a/kernel/pid.c b/kernel/pid.c
index 4d73a83..c79c1d5 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -83,6 +83,7 @@ struct pid_namespace init_pid_ns = {
#ifdef CONFIG_PID_NS
.ns.ops = &pidns_operations,
#endif
+ .core_pattern = "core",
};
EXPORT_SYMBOL_GPL(init_pid_ns);
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a65ba13..16d6d21 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -123,6 +123,9 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
for (i = 1; i < PIDMAP_ENTRIES; i++)
atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
+ strncpy(ns->core_pattern, parent_pid_ns->core_pattern,
+ sizeof(ns->core_pattern));
+
return ns;
out_free_map:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 97715fd..70f8af5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -100,7 +100,6 @@
extern int suid_dumpable;
#ifdef CONFIG_COREDUMP
extern int core_uses_pid;
-extern char core_pattern[];
extern unsigned int core_pipe_limit;
#endif
extern int pid_max;
@@ -469,7 +468,7 @@ static struct ctl_table kern_table[] = {
},
{
.procname = "core_pattern",
- .data = core_pattern,
+ .data = NULL,
.maxlen = CORENAME_MAX_SIZE,
.mode = 0644,
.proc_handler = proc_dostring_coredump,
@@ -2301,6 +2300,8 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
static void validate_coredump_safety(void)
{
#ifdef CONFIG_COREDUMP
+ char *core_pattern = current->nsproxy->pid_ns_for_children->core_pattern;
+
if (suid_dumpable == SUID_DUMP_ROOT &&
core_pattern[0] != '/' && core_pattern[0] != '|') {
printk(KERN_WARNING "Unsafe core_pattern used with "\
@@ -2323,10 +2324,19 @@ static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write,
static int proc_dostring_coredump(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
- int error = proc_dostring(table, write, buffer, lenp, ppos);
- if (!error)
- validate_coredump_safety();
- return error;
+ int ret;
+
+ if (write && *ppos && sysctl_writes_strict == SYSCTL_WRITES_WARN)
+ warn_sysctl_write(table);
+
+ ret = _proc_do_string(current->nsproxy->pid_ns_for_children->core_pattern,
+ table->maxlen, write,
+ (char __user *)buffer, lenp, ppos);
+ if (ret)
+ return ret;
+
+ validate_coredump_safety();
+ return 0;
}
#endif
--
1.8.5.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] Make core_pattern support namespace
2016-02-04 12:16 [PATCH] [RFC] Make core_pattern support namespace Zhao Lei
@ 2016-02-16 14:26 ` Mateusz Guzik
2016-02-17 20:15 ` Eric W. Biederman
2016-02-18 11:13 ` Zhao Lei
0 siblings, 2 replies; 8+ messages in thread
From: Mateusz Guzik @ 2016-02-16 14:26 UTC (permalink / raw)
To: Zhao Lei; +Cc: linux-kernel, containers
On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> Currently, each container shared one copy of coredump setting
> with the host system, if host system changed the setting, each
> running containers will be affected.
>
> Moreover, it is not easy to let each container keeping their own
> coredump setting.
>
> We can use some workaround as pipe program to make the second
> requirement possible, but it is not simple, and both host and
> container are limited to set to fixed pipe program.
> In one word, for host running contailer, we can't change core_pattern
> anymore.
> To make the problem more hard, if a host running more than one
> container product, each product will try to snatch the global
> coredump setting to fit their own requirement.
>
> For container based on namespace design, it is good to allow
> each container keeping their own coredump setting.
>
> It will bring us following benefit:
> 1: Each container can change their own coredump setting
> based on operation on /proc/sys/kernel/core_pattern
> 2: Coredump setting changed in host will not affect
> running containers.
> 3: Support both case of "putting coredump in guest" and
> "putting curedump in host".
>
> Each namespace-based software(lxc, docker, ..) can use this function
> to custom their dump setting.
>
> And this function makes each continer working as separate system,
> it fit for design goal of namespace.
>
Sorry if this is a false alarm, I don't have easy means to test it, but
is not this an immediate privilege escalation?
In particular, if the new pattern was to include a pipe, would not it
cause the kernel to run whatever the namespace put in there, but on the
"host"?
The feature definitely looks useful, but this is another privileged
operation which is now made available to unprivileged users. This poses
some questions:
- should the namespace be allowed to set anything it wants, including
pipes? I would argue the latter should be disallowed for simplicity
- now that the pattern can change at any time by namespace root, it
becomes fishy to parse it for filename generation without any
protection against concurrent modification
- how do you ensure that namespaces which did not explicitely set their
pattern still get updates from the host?
That said, I suggest adding an allocated buffer to hold it or be NULL
otherwise. If the pointer is NULL, the "host" pattern is used.
For dumping purposes the code could just kmalloc, rcu_read_lock, memcpy
and be done with it. Or, if it is acceptable given the size, just have a
buffer on the stack.
Finally, the code setting it could simply xchg the pointer to avoid
locks if there is no good lock to be taken here, and then kfree_rcu the
old buffer.
Just my $0,03.
--
Mateusz Guzik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make core_pattern support namespace
2016-02-16 14:26 ` [PATCH] " Mateusz Guzik
@ 2016-02-17 20:15 ` Eric W. Biederman
2016-02-17 20:54 ` Mateusz Guzik
2016-02-18 11:13 ` Zhao Lei
1 sibling, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2016-02-17 20:15 UTC (permalink / raw)
To: Mateusz Guzik; +Cc: Zhao Lei, containers, linux-kernel
Mateusz Guzik <mguzik@redhat.com> writes:
> On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
>> Currently, each container shared one copy of coredump setting
>> with the host system, if host system changed the setting, each
>> running containers will be affected.
>>
>> Moreover, it is not easy to let each container keeping their own
>> coredump setting.
>>
>> We can use some workaround as pipe program to make the second
>> requirement possible, but it is not simple, and both host and
>> container are limited to set to fixed pipe program.
>> In one word, for host running contailer, we can't change core_pattern
>> anymore.
>> To make the problem more hard, if a host running more than one
>> container product, each product will try to snatch the global
>> coredump setting to fit their own requirement.
>>
>> For container based on namespace design, it is good to allow
>> each container keeping their own coredump setting.
>>
>> It will bring us following benefit:
>> 1: Each container can change their own coredump setting
>> based on operation on /proc/sys/kernel/core_pattern
>> 2: Coredump setting changed in host will not affect
>> running containers.
>> 3: Support both case of "putting coredump in guest" and
>> "putting curedump in host".
>>
>> Each namespace-based software(lxc, docker, ..) can use this function
>> to custom their dump setting.
>>
>> And this function makes each continer working as separate system,
>> it fit for design goal of namespace.
>>
>
> Sorry if this is a false alarm, I don't have easy means to test it, but
> is not this an immediate privilege escalation?
It is. This is why we do not currently have a per namespace setting.
Solving the user mode helper problem is technically a fair amount of
work, if not theoretically challenging.
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make core_pattern support namespace
2016-02-17 20:15 ` Eric W. Biederman
@ 2016-02-17 20:54 ` Mateusz Guzik
2016-02-18 11:59 ` Zhao Lei
0 siblings, 1 reply; 8+ messages in thread
From: Mateusz Guzik @ 2016-02-17 20:54 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Zhao Lei, containers, linux-kernel
On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
> Mateusz Guzik <mguzik@redhat.com> writes:
> > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> >> For container based on namespace design, it is good to allow
> >> each container keeping their own coredump setting.
> >
> > Sorry if this is a false alarm, I don't have easy means to test it, but
> > is not this an immediate privilege escalation?
>
> It is. This is why we do not currently have a per namespace setting.
>
Thanks for confimation.
> Solving the user mode helper problem is technically a fair amount of
> work, if not theoretically challenging.
>
Well, I would say custom core_patterns without pipe support are still
better than none.
Say one would ensure a stable core_pattern (i.e. that it cannot be
modified as it is being parsed) and a restricted set of allowed
characters in the pattern (which would not include the pipe), validated
when one attempts to set the pattern.
Does this sound acceptable? If so, and there are no counter ideas from
Lei, I can get around to that.
--
Mateusz Guzik
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] Make core_pattern support namespace
2016-02-17 20:54 ` Mateusz Guzik
@ 2016-02-18 11:59 ` Zhao Lei
2016-02-18 20:18 ` Eric W. Biederman
0 siblings, 1 reply; 8+ messages in thread
From: Zhao Lei @ 2016-02-18 11:59 UTC (permalink / raw)
To: 'Mateusz Guzik', 'Eric W. Biederman'
Cc: containers, linux-kernel
Hi, Mateusz Guzik
> -----Original Message-----
> From: Mateusz Guzik [mailto:mguzik@redhat.com]
> Sent: Thursday, February 18, 2016 4:54 AM
> To: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Zhao Lei <zhaolei@cn.fujitsu.com>; containers@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] Make core_pattern support namespace
>
> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
> > Mateusz Guzik <mguzik@redhat.com> writes:
> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> > >> For container based on namespace design, it is good to allow
> > >> each container keeping their own coredump setting.
> > >
> > > Sorry if this is a false alarm, I don't have easy means to test it, but
> > > is not this an immediate privilege escalation?
> >
> > It is. This is why we do not currently have a per namespace setting.
> >
>
> Thanks for confimation.
>
> > Solving the user mode helper problem is technically a fair amount of
> > work, if not theoretically challenging.
> >
>
> Well, I would say custom core_patterns without pipe support are still
> better than none.
>
+1.
> Say one would ensure a stable core_pattern (i.e. that it cannot be
> modified as it is being parsed) and a restricted set of allowed
> characters in the pattern (which would not include the pipe), validated
> when one attempts to set the pattern.
>
> Does this sound acceptable? If so, and there are no counter ideas from
> Lei, I can get around to that.
>
If we can let kernel select pipe_program in vm's filesystem, and run
pipe_program with vm's filesystem, set a pipe for core_patterm in vm
will be safe.
What is your opinion on above solution?
If above way is not acceptable, or impossible to realize, I also
agree your solution of limit vm setting pipe.
Thanks
Zhaolei
> --
> Mateusz Guzik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make core_pattern support namespace
2016-02-18 11:59 ` Zhao Lei
@ 2016-02-18 20:18 ` Eric W. Biederman
2016-02-19 10:24 ` Zhao Lei
0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2016-02-18 20:18 UTC (permalink / raw)
To: Zhao Lei; +Cc: 'Mateusz Guzik', containers, linux-kernel
Zhao Lei <zhaolei@cn.fujitsu.com> writes:
> Hi, Mateusz Guzik
>
>> -----Original Message-----
>> From: Mateusz Guzik [mailto:mguzik@redhat.com]
>> Sent: Thursday, February 18, 2016 4:54 AM
>> To: Eric W. Biederman <ebiederm@xmission.com>
>> Cc: Zhao Lei <zhaolei@cn.fujitsu.com>; containers@lists.linux-foundation.org;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH] Make core_pattern support namespace
>>
>> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
>> > Mateusz Guzik <mguzik@redhat.com> writes:
>> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
>> > >> For container based on namespace design, it is good to allow
>> > >> each container keeping their own coredump setting.
>> > >
>> > > Sorry if this is a false alarm, I don't have easy means to test it, but
>> > > is not this an immediate privilege escalation?
>> >
>> > It is. This is why we do not currently have a per namespace setting.
>> >
>>
>> Thanks for confimation.
>>
>> > Solving the user mode helper problem is technically a fair amount of
>> > work, if not theoretically challenging.
>> >
>>
>> Well, I would say custom core_patterns without pipe support are still
>> better than none.
>>
> +1.
-1.
The problem is solvable. It is just a matter of effort to build the
necessary infrastructure and make certain everything works correctly.
>> Say one would ensure a stable core_pattern (i.e. that it cannot be
>> modified as it is being parsed) and a restricted set of allowed
>> characters in the pattern (which would not include the pipe), validated
>> when one attempts to set the pattern.
>>
>> Does this sound acceptable? If so, and there are no counter ideas from
>> Lei, I can get around to that.
>>
> If we can let kernel select pipe_program in vm's filesystem, and run
> pipe_program with vm's filesystem, set a pipe for core_patterm in vm
> will be safe.
> What is your opinion on above solution?
Please see the other thread about user mode helpers that is current
active on the container mailling list.
> If above way is not acceptable, or impossible to realize, I also
> agree your solution of limit vm setting pipe.
I honestly think have a fully capable system that we have now that is
capable of using setns and entering a containers context is better than
something half baked. The solution either needs to support everything
core_pattern does today but correctly in a container environment.
To make the case that something does not need to be supported, a
convincing argument needs to be presented and tested that no one ever
does that. Without such an argument you will be breaking userspace
in a different way. Not actually fixing things.
My baseline reference implementation of all of this is that it is
possible when a sufficiently privileged process writes to core_pattern
to fork a child with the same environment and context as the writer.
That forked child could then become a kernel thread and fork any
additional children needed as user mode helpers.
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] Make core_pattern support namespace
2016-02-18 20:18 ` Eric W. Biederman
@ 2016-02-19 10:24 ` Zhao Lei
0 siblings, 0 replies; 8+ messages in thread
From: Zhao Lei @ 2016-02-19 10:24 UTC (permalink / raw)
To: 'Eric W. Biederman'
Cc: 'Mateusz Guzik', containers, linux-kernel
Hi, Biederman
> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm@xmission.com]
> Sent: Friday, February 19, 2016 4:18 AM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>
> Cc: 'Mateusz Guzik' <mguzik@redhat.com>;
> containers@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] Make core_pattern support namespace
>
> Zhao Lei <zhaolei@cn.fujitsu.com> writes:
>
> > Hi, Mateusz Guzik
> >
> >> -----Original Message-----
> >> From: Mateusz Guzik [mailto:mguzik@redhat.com]
> >> Sent: Thursday, February 18, 2016 4:54 AM
> >> To: Eric W. Biederman <ebiederm@xmission.com>
> >> Cc: Zhao Lei <zhaolei@cn.fujitsu.com>;
> containers@lists.linux-foundation.org;
> >> linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH] Make core_pattern support namespace
> >>
> >> On Wed, Feb 17, 2016 at 02:15:24PM -0600, Eric W. Biederman wrote:
> >> > Mateusz Guzik <mguzik@redhat.com> writes:
> >> > > On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> >> > >> For container based on namespace design, it is good to allow
> >> > >> each container keeping their own coredump setting.
> >> > >
> >> > > Sorry if this is a false alarm, I don't have easy means to test it, but
> >> > > is not this an immediate privilege escalation?
> >> >
> >> > It is. This is why we do not currently have a per namespace setting.
> >> >
> >>
> >> Thanks for confimation.
> >>
> >> > Solving the user mode helper problem is technically a fair amount of
> >> > work, if not theoretically challenging.
> >> >
> >>
> >> Well, I would say custom core_patterns without pipe support are still
> >> better than none.
> >>
> > +1.
>
> -1.
>
> The problem is solvable. It is just a matter of effort to build the
> necessary infrastructure and make certain everything works correctly.
>
Writting a pipe for both host and container have some limit:
1: All host who wantting to run container can not custom core_patterns
to other value, it is to say, core_patterns will turn to be a const
value in linux release with container support.
2: If a host support 2 types of container manager, for example,
lxc and docker, each manager will try to modify host's core_patterns
to its internal pipe program, and cause competition.
3: container can not modify core_patterns for its need, it is not like
a real system.
> >> Say one would ensure a stable core_pattern (i.e. that it cannot be
> >> modified as it is being parsed) and a restricted set of allowed
> >> characters in the pattern (which would not include the pipe), validated
> >> when one attempts to set the pattern.
> >>
> >> Does this sound acceptable? If so, and there are no counter ideas from
> >> Lei, I can get around to that.
> >>
> > If we can let kernel select pipe_program in vm's filesystem, and run
> > pipe_program with vm's filesystem, set a pipe for core_patterm in vm
> > will be safe.
> > What is your opinion on above solution?
>
> Please see the other thread about user mode helpers that is current
> active on the container mailling list.
>
User mode helpers is discussed in other threads, but we hadn't get a
conclusion to answer is user mode helpers better than letting kernel
support core_pattern in namespace, just as our discussing in this thread.
> > If above way is not acceptable, or impossible to realize, I also
> > agree your solution of limit vm setting pipe.
>
> I honestly think have a fully capable system that we have now that is
> capable of using setns and entering a containers context is better than
> something half baked. The solution either needs to support everything
> core_pattern does today but correctly in a container environment.
>
If we can fix problem of "the pipe dumping data to host filesystem",
both host and container will able to support full core_pattern.
> To make the case that something does not need to be supported, a
> convincing argument needs to be presented and tested that no one ever
> does that. Without such an argument you will be breaking userspace
> in a different way. Not actually fixing things.
>
It is same problem with above.
When we fixed it, all container can be free to set core_pattern without
breaking host env, and the every container manager don't need to add
special argument for setting core_pattern.
> My baseline reference implementation of all of this is that it is
> possible when a sufficiently privileged process writes to core_pattern
> to fork a child with the same environment and context as the writer.
> That forked child could then become a kernel thread and fork any
> additional children needed as user mode helpers.
>
Thanks for detailed explanation.
I'll investigate it is possible to write piped dump data to container's
filesystem.
We still have "container-write-to-host" problem even if we don't use this
patch, in current kernel, if we run a container with privilege,
1: container can modify core_pattern of host and other container
2: container can set core_pattern to pipe, then dump data to host filesystem
3: container can use this way to do more bad thing
Each of them are not accessable.
In summary:
+-------------+----------------+-------------+--------------------------+
| | CURRENT_KERNEL | AFTER_PATCH | AFTER_MORE_WORK_ON_PATCH |
|-------------+----------------+-------------+--------------------------+
| WITHOUT | SAFE | SAFE | SAFE |
| PRIVILEGE | | | |
|-------------+----------------+-------------+--------------------------+
| PRIVILEGE | DANGEROUS | SAFE | SAFE |
| DUMPTO FILE | | | |
|-------------+----------------+-------------+--------------------------+
| PRIVILEGE | DANGEROUS | DANGEROUS | SAFE |
| DUMPTO PIPE | | | |
+-------------+----------------+-------------+--------------------------+
So letting ns support core_pattern is also a bug fix for above case.
What is your opinion?
Any suggestions are welcome.
Thanks
Zhaolei
> Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] Make core_pattern support namespace
2016-02-16 14:26 ` [PATCH] " Mateusz Guzik
2016-02-17 20:15 ` Eric W. Biederman
@ 2016-02-18 11:13 ` Zhao Lei
1 sibling, 0 replies; 8+ messages in thread
From: Zhao Lei @ 2016-02-18 11:13 UTC (permalink / raw)
To: 'Mateusz Guzik'; +Cc: linux-kernel, containers
Hi, Mateusz Guzik
Thanks for your detailed comment and suggestion on this patch.
> -----Original Message-----
> From: Mateusz Guzik [mailto:mguzik@redhat.com]
> Sent: Tuesday, February 16, 2016 10:26 PM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>
> Cc: linux-kernel@vger.kernel.org; containers@lists.linux-foundation.org
> Subject: Re: [PATCH] Make core_pattern support namespace
>
> On Tue, Feb 16, 2016 at 07:33:39PM +0800, Zhao Lei wrote:
> > Currently, each container shared one copy of coredump setting
> > with the host system, if host system changed the setting, each
> > running containers will be affected.
> >
> > Moreover, it is not easy to let each container keeping their own
> > coredump setting.
> >
> > We can use some workaround as pipe program to make the second
> > requirement possible, but it is not simple, and both host and
> > container are limited to set to fixed pipe program.
> > In one word, for host running contailer, we can't change core_pattern
> > anymore.
> > To make the problem more hard, if a host running more than one
> > container product, each product will try to snatch the global
> > coredump setting to fit their own requirement.
> >
> > For container based on namespace design, it is good to allow
> > each container keeping their own coredump setting.
> >
> > It will bring us following benefit:
> > 1: Each container can change their own coredump setting
> > based on operation on /proc/sys/kernel/core_pattern
> > 2: Coredump setting changed in host will not affect
> > running containers.
> > 3: Support both case of "putting coredump in guest" and
> > "putting curedump in host".
> >
> > Each namespace-based software(lxc, docker, ..) can use this function
> > to custom their dump setting.
> >
> > And this function makes each continer working as separate system,
> > it fit for design goal of namespace.
> >
>
> Sorry if this is a false alarm, I don't have easy means to test it, but
> is not this an immediate privilege escalation?
>
> In particular, if the new pattern was to include a pipe, would not it
> cause the kernel to run whatever the namespace put in there, but on the
> "host"?
>
In my mind, if user don't run vm in with privilege request,
the container manager can do following processing:
1: User set custom core_pattern in starting container, as:
# lxc-start --core-pattern='xxx' -n vm01
or
# docker run --core-pattern='xxx' my_image
or
set CORE_PATTERN in vm's config file
2: Container manager(lxc or docker) mount /proc as rw in init period,
and set core_pattern.
3: Container manager remount /proc as ro, and give vm to user.
User/program in vm can not change core_pattern setting, just as lxc
and docker's current default behavor.
If user run vm with privilege request, it will be a problem.
User/process in vm can change core-pattern setting, if change to a file,
it is not a problem, because it is a file in vm's filesystem.
But if changed to a pipe, it is a problem in current kernel design,
the pipe program is in host's filesystem, it means the vm can change host.
In short:
Run vm in non-privilege | do anything | no problem
Run vm in privilege | change core_pattern to file | no problem
Run vm in privilege | change core_pattern to pipe | IS problem
As a solution, maybe we can modify kernel to search the pipe program
in vm's filesystem, and the pipe problem runs in vm's fs context,
so a vm can only put dump file in its private filesystem, except vm
manager link the dir into host.
Another solution is to only allow vm change core_pattern to file(avoid pipe),
but is looks strange, the vm should have same core_dump setting function
with normal system.
> The feature definitely looks useful, but this is another privileged
> operation which is now made available to unprivileged users. This poses
> some questions:
> - should the namespace be allowed to set anything it wants, including
> pipes? I would argue the latter should be disallowed for simplicity
Yes, current patch does allow, it can be fixed in above way.
> - now that the pattern can change at any time by namespace root, it
> becomes fishy to parse it for filename generation without any
> protection against concurrent modification
Before this patch, kernel can works corrent, so I think kernel already
have lock to avoid changing and reading the cure_pattern buffer in
same time.
This patch only modify the buffer place, so the existence lock will
still can work, I'll confirm it in source.
> - how do you ensure that namespaces which did not explicitely set their
> pattern still get updates from the host?
>
> That said, I suggest adding an allocated buffer to hold it or be NULL
> otherwise. If the pointer is NULL, the "host" pattern is used.
>
> For dumping purposes the code could just kmalloc, rcu_read_lock, memcpy
> and be done with it. Or, if it is acceptable given the size, just have a
> buffer on the stack.
>
> Finally, the code setting it could simply xchg the pointer to avoid
> locks if there is no good lock to be taken here, and then kfree_rcu the
> old buffer.
>
Actually I considered about this way in doing patch, as:
HOST core_pattern=foo
+- VM1 core_pattern=NULL
+- VM1_1 core_pattern=NULL
VM1 and VM1_1 use nearext core_pattern setting in tree(foo).
And if VM1_1 changed core pattern to bar:
HOST core_pattern=foo
+- VM1 core_pattern=NULL
+- VM1_1 core_pattern=bar
VM1 use foo and VM1_1 use bar.
It means vm can always set core_pattern buffer pointer to NULL in creating,
until someone changes it.
But I finally selected to clone core_pattern in creating vm, because if we
see a vm as a independent system, its core_pattern should only changed by
process running in it, and after a vm got created, the host will not able
to change vm's internal setting, except login/run_command in vm's context.
Thanks
Zhaolei
> Just my $0,03.
>
> --
> Mateusz Guzik
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-02-19 10:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-04 12:16 [PATCH] [RFC] Make core_pattern support namespace Zhao Lei
2016-02-16 14:26 ` [PATCH] " Mateusz Guzik
2016-02-17 20:15 ` Eric W. Biederman
2016-02-17 20:54 ` Mateusz Guzik
2016-02-18 11:59 ` Zhao Lei
2016-02-18 20:18 ` Eric W. Biederman
2016-02-19 10:24 ` Zhao Lei
2016-02-18 11:13 ` Zhao Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).