* [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() @ 2009-10-09 4:50 KOSAKI Motohiro 2009-10-10 0:13 ` Andrew Morton 0 siblings, 1 reply; 13+ messages in thread From: KOSAKI Motohiro @ 2009-10-09 4:50 UTC (permalink / raw) To: Bryan Donlan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A ok, all lkml reviewer ack this patch. then, I've switched to linux-api review. Any comments? ================================================== From: Timo Sirainen <tss-X3B1VOXEql0@public.gmane.org> Currently glibc2 doesn't have setproctitle(3), so several userland daemons attempt to emulate it by doing some brutal stack modifications. This works most of the time, but it has problems. For example: % ps -ef |grep avahi-daemon avahi 1679 1 0 09:20 ? 00:00:00 avahi-daemon: running [kosadesk.local] # cat /proc/1679/cmdline avahi-daemon: running [kosadesk.local] This looks good, but the process has also overwritten its environment area and made the environ file useless: # cat /proc/1679/environ adesk.local] Another problem is that the process title length is limited by the size of the environment. Security conscious people try to avoid potential information leaks by clearing most of the environment before running a daemon: # env - MINIMUM_NEEDED_VAR=foo /path/to/daemon The resulting environment size may be too small to fit the wanted process titles. This patch makes it possible for userspace to implement setproctitle() cleanly. It adds a new PR_SET_PROCTITLE_AREA option for prctl(), which updates task's mm_struct->arg_start and arg_end to the given area. test_setproctitle.c ================================================ #define ERR(str) (perror(str), exit(1)) void settitle(char* title){ int err; err = prctl(34, title, strlen(title)+1); if (err < 0) ERR("prctl "); } void main(void){ long i; char buf[1024]; for (i = 0; i < 10000000000LL; i++){ sprintf(buf, "loooooooooooooooooooooooong string %d",i); settitle(buf); } } ================================================== Cc: Bryan Donlan <bdonlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: Ulrich Drepper <drepper-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Signed-off-by: Timo Sirainen <tss-X3B1VOXEql0@public.gmane.org> --- fs/proc/base.c | 57 +++++++++++++++++++++++++++++----------------- include/linux/mm_types.h | 2 + include/linux/prctl.h | 4 +++ kernel/fork.c | 1 + kernel/sys.c | 23 ++++++++++++++++++ 5 files changed, 66 insertions(+), 21 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 6f742f6..2f48440 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -255,32 +255,47 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer) int res = 0; unsigned int len; struct mm_struct *mm = get_task_mm(task); + unsigned seq; + if (!mm) goto out; + + /* The process was not constructed yet? */ if (!mm->arg_end) - goto out_mm; /* Shh! No looking before we're done */ + goto out_mm; - len = mm->arg_end - mm->arg_start; - - if (len > PAGE_SIZE) - len = PAGE_SIZE; - - res = access_process_vm(task, mm->arg_start, buffer, len, 0); - - // If the nul at the end of args has been overwritten, then - // assume application is using setproctitle(3). - if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { - len = strnlen(buffer, res); - if (len < res) { - res = len; - } else { - len = mm->env_end - mm->env_start; - if (len > PAGE_SIZE - res) - len = PAGE_SIZE - res; - res += access_process_vm(task, mm->env_start, buffer+res, len, 0); + do { + seq = read_seqbegin(&mm->arg_lock); + + len = mm->arg_end - mm->arg_start; + if (len > PAGE_SIZE) + len = PAGE_SIZE; + + res = access_process_vm(task, mm->arg_start, buffer, len, 0); + + if (mm->arg_end != mm->env_start) + /* PR_SET_PROCTITLE_AREA used */ res = strnlen(buffer, res); + else if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { + /* + * If the nul at the end of args has been overwritten, + * then assume application is using sendmail's + * SPT_REUSEARGV style argv override. + */ + len = strnlen(buffer, res); + if (len < res) { + res = len; + } else { + len = mm->env_end - mm->env_start; + if (len > PAGE_SIZE - res) + len = PAGE_SIZE - res; + res += access_process_vm(task, mm->env_start, + buffer+res, len, 0); + res = strnlen(buffer, res); + } } - } + } while (read_seqretry(&mm->arg_lock, seq)); + out_mm: mmput(mm); out: diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0042090..9daa1fa 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -12,6 +12,7 @@ #include <linux/completion.h> #include <linux/cpumask.h> #include <linux/page-debug-flags.h> +#include <linux/seqlock.h> #include <asm/page.h> #include <asm/mmu.h> @@ -236,6 +237,7 @@ struct mm_struct { unsigned long stack_vm, reserved_vm, def_flags, nr_ptes; unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; + seqlock_t arg_lock; unsigned long arg_start, arg_end, env_start, env_end; unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ diff --git a/include/linux/prctl.h b/include/linux/prctl.h index b00df4c..feffb17 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -88,4 +88,8 @@ #define PR_TASK_PERF_COUNTERS_DISABLE 31 #define PR_TASK_PERF_COUNTERS_ENABLE 32 + +/* Set process title memory area for setproctitle() */ +#define PR_SET_PROCTITLE_AREA 34 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index bfee931..9eaa1cb 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -435,6 +435,7 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p) mm->free_area_cache = TASK_UNMAPPED_BASE; mm->cached_hole_size = ~0UL; mm_init_owner(mm, p); + seqlock_init(&mm->arg_lock); if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; diff --git a/kernel/sys.c b/kernel/sys.c index b3f1097..e26c687 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1528,6 +1528,29 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, current->timer_slack_ns = arg2; error = 0; break; + case PR_SET_PROCTITLE_AREA: { + struct mm_struct *mm = current->mm; + unsigned long addr = arg2; + unsigned long len = arg3; + unsigned long end = arg2 + arg3; + + if (len > PAGE_SIZE) + return -EINVAL; + + if (addr >= end) + return -EINVAL; + + if (!access_ok(VERIFY_READ, addr, len)) { + return -EFAULT; + } + + write_seqlock(&mm->arg_lock); + mm->arg_start = addr; + mm->arg_end = addr + len; + write_sequnlock(&mm->arg_lock); + + return 0; + } default: error = -EINVAL; break; -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() 2009-10-09 4:50 [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() KOSAKI Motohiro @ 2009-10-10 0:13 ` Andrew Morton 2009-10-10 2:22 ` Bryan Donlan 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2009-10-10 0:13 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: Bryan Donlan, linux-kernel, Ulrich Drepper, linux-api On Fri, 9 Oct 2009 13:50:17 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > ok, all lkml reviewer ack this patch. > then, I've switched to linux-api review. > > Any comments? > > > > ================================================== > From: Timo Sirainen <tss@iki.fi> > > Currently glibc2 doesn't have setproctitle(3), so several userland > daemons attempt to emulate it by doing some brutal stack modifications. > This works most of the time, but it has problems. For example: > > % ps -ef |grep avahi-daemon > avahi 1679 1 0 09:20 ? 00:00:00 avahi-daemon: running [kosadesk.local] > > # cat /proc/1679/cmdline > avahi-daemon: running [kosadesk.local] > > This looks good, but the process has also overwritten its environment > area and made the environ file useless: > > # cat /proc/1679/environ > adesk.local] > > Another problem is that the process title length is limited by the size of > the environment. Security conscious people try to avoid potential information > leaks by clearing most of the environment before running a daemon: > > # env - MINIMUM_NEEDED_VAR=foo /path/to/daemon > > The resulting environment size may be too small to fit the wanted process > titles. > > This patch makes it possible for userspace to implement setproctitle() > cleanly. It adds a new PR_SET_PROCTITLE_AREA option for prctl(), which > updates task's mm_struct->arg_start and arg_end to the given area. > > test_setproctitle.c > ================================================ > #define ERR(str) (perror(str), exit(1)) > > void settitle(char* title){ > int err; > > err = prctl(34, title, strlen(title)+1); > if (err < 0) > ERR("prctl "); > } > > void main(void){ > long i; > char buf[1024]; > > for (i = 0; i < 10000000000LL; i++){ > sprintf(buf, "loooooooooooooooooooooooong string %d",i); > settitle(buf); > } > } gee that's a good changelog. > ... > > @@ -255,32 +255,47 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer) > int res = 0; > unsigned int len; > struct mm_struct *mm = get_task_mm(task); > + unsigned seq; > + > if (!mm) > goto out; > + > + /* The process was not constructed yet? */ > if (!mm->arg_end) > - goto out_mm; /* Shh! No looking before we're done */ > + goto out_mm; > > - len = mm->arg_end - mm->arg_start; > - > - if (len > PAGE_SIZE) > - len = PAGE_SIZE; > - > - res = access_process_vm(task, mm->arg_start, buffer, len, 0); > - > - // If the nul at the end of args has been overwritten, then > - // assume application is using setproctitle(3). > - if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { > - len = strnlen(buffer, res); > - if (len < res) { > - res = len; > - } else { > - len = mm->env_end - mm->env_start; > - if (len > PAGE_SIZE - res) > - len = PAGE_SIZE - res; > - res += access_process_vm(task, mm->env_start, buffer+res, len, 0); > + do { > + seq = read_seqbegin(&mm->arg_lock); > + > + len = mm->arg_end - mm->arg_start; > + if (len > PAGE_SIZE) > + len = PAGE_SIZE; > + > + res = access_process_vm(task, mm->arg_start, buffer, len, 0); > + > + if (mm->arg_end != mm->env_start) > + /* PR_SET_PROCTITLE_AREA used */ > res = strnlen(buffer, res); Hang on. If PR_SET_PROCTITLE_AREA installed a bad address then access_process_vm() will return -EFAULT and nothing was written to `buffer'? Also, concurrent PR_SET_PROCTITLE_AREA could cause arg_start and arg_end to have inconsistent/incoherent values. This might result in a fault but it also might result in a read of garbage userspace memory. I guess all of these issues could be written off as "userspace bugs", but our behaviour here isn't nice. We should at least return that -EFAULT!. > + else if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { > + /* > + * If the nul at the end of args has been overwritten, > + * then assume application is using sendmail's > + * SPT_REUSEARGV style argv override. > + */ > + len = strnlen(buffer, res); > + if (len < res) { > + res = len; > + } else { > + len = mm->env_end - mm->env_start; > + if (len > PAGE_SIZE - res) > + len = PAGE_SIZE - res; > + res += access_process_vm(task, mm->env_start, > + buffer+res, len, 0); > + res = strnlen(buffer, res); > + } And if access_process_vm() returns a -ve errno here, the code simply flies off into the weeds. > } > - } > + } while (read_seqretry(&mm->arg_lock, seq)); > + > out_mm: > mmput(mm); > > ... > > @@ -236,6 +237,7 @@ struct mm_struct { > unsigned long stack_vm, reserved_vm, def_flags, nr_ptes; > unsigned long start_code, end_code, start_data, end_data; > unsigned long start_brk, brk, start_stack; > + seqlock_t arg_lock; > unsigned long arg_start, arg_end, env_start, env_end; > > unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ seqlocks are nice but I have to wonder if they made this patch unnecessarily complex. Couldn't we just do: PR_SET_PROCTITLE_AREA: spin_lock(mm->lock); mm->arg_start = addr; mm->arg_end = addr + len; spin_unlock(mm->lock); and proc_pid_cmdline() { unsigned long addr; unsigned long end; spin_lock(mm->lock); addr = mm->arg_start; end = mm->arg_end; spin_unlock(mm->lock); <use `addr' and `len'> } ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() 2009-10-10 0:13 ` Andrew Morton @ 2009-10-10 2:22 ` Bryan Donlan [not found] ` <3e8340490910091922g7891b31al649e91f15ffae687-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Bryan Donlan @ 2009-10-10 2:22 UTC (permalink / raw) To: Andrew Morton; +Cc: KOSAKI Motohiro, linux-kernel, Ulrich Drepper, linux-api On Fri, Oct 9, 2009 at 8:13 PM, Andrew Morton <akpm@linux-foundation.org> wrote: >> + res = access_process_vm(task, mm->arg_start, buffer, len, 0); >> + >> + if (mm->arg_end != mm->env_start) >> + /* PR_SET_PROCTITLE_AREA used */ >> res = strnlen(buffer, res); > > Hang on. > > If PR_SET_PROCTITLE_AREA installed a bad address then > access_process_vm() will return -EFAULT and nothing was written to > `buffer'? > > Also, concurrent PR_SET_PROCTITLE_AREA could cause arg_start and > arg_end to have inconsistent/incoherent values. This might result in a > fault but it also might result in a read of garbage userspace memory. > > I guess all of these issues could be written off as "userspace bugs", > but our behaviour here isn't nice. We should at least return that > -EFAULT!. access_process_vm() does not return an error code - just the number of bytes successfully transferred. If there's a fault, we just get 0 (or some intermediate value) back (as discussed on lkml). >> + res += access_process_vm(task, mm->env_start, >> + buffer+res, len, 0); >> + res = strnlen(buffer, res); >> + } > > And if access_process_vm() returns a -ve errno here, the code simply > flies off into the weeds. This code was originally there - it's just been lifted into an if :) and since access_process_vm will never return a negative errno, it's not a problem. Now, there might be a case for arguing that we should try harder to get an error code (-EIO if we don't read the number of bytes we asked for), but /proc/pid/cmdline never has before, and that would then make this a visible change for consumers of /proc/pid/cmdline. Can ps handle reading cmdline returning -EIO? > seqlocks are nice but I have to wonder if they made this patch > unnecessarily complex. Couldn't we just do: > > PR_SET_PROCTITLE_AREA: > > spin_lock(mm->lock); > mm->arg_start = addr; > mm->arg_end = addr + len; > spin_unlock(mm->lock); > > and > > proc_pid_cmdline() > { > unsigned long addr; > unsigned long end; > > spin_lock(mm->lock); > addr = mm->arg_start; > end = mm->arg_end; > spin_unlock(mm->lock); > > <use `addr' and `len'> > } > > ? As discussed on the lkml thread, this opens up a nasty race: Process A: calls PR_SET_PROCTITLE_AREA Process B: read cmdline: Process B: spin_lock Process B: read mm->arg_* Process B: unlock Process A: mm->arg_* = ... Process A: free(old_cmdline_area) Process A: secret_password = malloc(...) Process A: strcpy(secret_password, stuff); Process B: access_process_vm If secret_password == arg_start, then process B just read secret information from process A's address space. Since process A has no idea when or even if process B will complete its read, it has no way of protecting itself from this, save from never reusing any memory which has ever been assigned to a proctitle area. The solution is to use the seqlock to detect this, and prevent the secret information from ever making it back to process B's userspace. Note that it's not enough to just recheck arg_start, as process A may reassign the proctitle area back to its original position after having it somewhere else for a while. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <3e8340490910091922g7891b31al649e91f15ffae687-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() [not found] ` <3e8340490910091922g7891b31al649e91f15ffae687-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-10 2:42 ` Andrew Morton [not found] ` <20091009194250.eb76e338.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2009-10-10 2:42 UTC (permalink / raw) To: Bryan Donlan Cc: KOSAKI Motohiro, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper, linux-api-u79uwXL29TY76Z2rM5mHXA On Fri, 9 Oct 2009 22:22:10 -0400 Bryan Donlan <bdonlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On Fri, Oct 9, 2009 at 8:13 PM, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote: > > >> + __ __ __ __ __ __ res = access_process_vm(task, mm->arg_start, buffer, len, 0); > >> + > >> + __ __ __ __ __ __ if (mm->arg_end != mm->env_start) > >> + __ __ __ __ __ __ __ __ __ __ /* PR_SET_PROCTITLE_AREA used */ > >> __ __ __ __ __ __ __ __ __ __ __ res = strnlen(buffer, res); > > > > Hang on. > > > > If PR_SET_PROCTITLE_AREA installed a bad address then > > access_process_vm() will return -EFAULT and nothing was written to > > `buffer'? > > > > Also, concurrent PR_SET_PROCTITLE_AREA could cause arg_start and > > arg_end to have inconsistent/incoherent values. __This might result in a > > fault but it also might result in a read of garbage userspace memory. > > > > I guess all of these issues could be written off as "userspace bugs", > > but our behaviour here isn't nice. __We should at least return that > > -EFAULT!. > > access_process_vm() does not return an error code - just the number of > bytes successfully transferred. If there's a fault, we just get 0 (or > some intermediate value) back OK. > (as discussed on lkml). That's not code documentation :( > >> + __ __ __ __ __ __ __ __ __ __ __ __ __ __ res += access_process_vm(task, mm->env_start, Your email client is converting tabs to non-ascii crap. gmail. Sigh. > > __ __ __ __ __ __ __ __mm->arg_start = addr; > > __ __ __ __ __ __ __ __mm->arg_end = addr + len; > > __ __ __ __ __ __ __ __spin_unlock(mm->lock); > > > > and > > > > __ __ __ __proc_pid_cmdline() > > __ __ __ __{ > > __ __ __ __ __ __ __ __unsigned long addr; > > __ __ __ __ __ __ __ __unsigned long end; > > > > __ __ __ __ __ __ __ __spin_lock(mm->lock); > > __ __ __ __ __ __ __ __addr = mm->arg_start; > > __ __ __ __ __ __ __ __end = mm->arg_end; > > __ __ __ __ __ __ __ __spin_unlock(mm->lock); > > > > __ __ __ __ __ __ __ __<use `addr' and `len'> > > __ __ __ __} > > > > ? > > As discussed on the lkml thread, this opens up a nasty race: > > Process A: calls PR_SET_PROCTITLE_AREA > Process B: read cmdline: > Process B: spin_lock > Process B: read mm->arg_* > Process B: unlock > Process A: mm->arg_* = ... > Process A: free(old_cmdline_area) > Process A: secret_password = malloc(...) > Process A: strcpy(secret_password, stuff); > Process B: access_process_vm > > If secret_password == arg_start, then process B just read secret > information from process A's address space. Since process A has no > idea when or even if process B will complete its read, it has no way > of protecting itself from this, save from never reusing any memory > which has ever been assigned to a proctitle area. OK. But there's no way in which the reader of either the patch or the resulting code can discover this subtlety. > The solution is to use the seqlock to detect this, and prevent the > secret information from ever making it back to process B's userspace. > Note that it's not enough to just recheck arg_start, as process A may > reassign the proctitle area back to its original position after having > it somewhere else for a while. Well seqlock is _a_ solution. Another is to use a mutex or an rwsem around the whole operation. With the code as you propose it, what happens if a process sits in a tight loop running setproctitle? Do other processes running `ps' get stuck in a livelock until the offending process gets scheduled out? -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20091009194250.eb76e338.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>]
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() [not found] ` <20091009194250.eb76e338.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> @ 2009-10-10 2:57 ` Bryan Donlan 2009-10-10 6:32 ` KOSAKI Motohiro 0 siblings, 1 reply; 13+ messages in thread From: Bryan Donlan @ 2009-10-10 2:57 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper, linux-api-u79uwXL29TY76Z2rM5mHXA On Fri, Oct 9, 2009 at 10:42 PM, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote: >> >> + __ __ __ __ __ __ __ __ __ __ __ __ __ __ res += access_process_vm(task, mm->env_start, > > Your email client is converting tabs to non-ascii crap. gmail. Sigh. Weird ... I'll have to see if I can do something about that :/ > OK. > > But there's no way in which the reader of either the patch or the > resulting code can discover this subtlety. I didn't write the log message or the code - I just mentioned these same issues back in the lkml thread :) But yes, this should be mentioned somewhere. >> The solution is to use the seqlock to detect this, and prevent the >> secret information from ever making it back to process B's userspace. >> Note that it's not enough to just recheck arg_start, as process A may >> reassign the proctitle area back to its original position after having >> it somewhere else for a while. > > Well seqlock is _a_ solution. Another is to use a mutex or an rwsem > around the whole operation. > > With the code as you propose it, what happens if a process sits in a > tight loop running setproctitle? Do other processes running `ps' get > stuck in a livelock until the offending process gets scheduled out? It does seem like a maximum spin count should be put in there - and maybe a timeout as well (since with FUSE etc it's possible to engineer page faults that take arbitrarily long). Also, it occurs to me that: > + do { > + seq = read_seqbegin(&mm->arg_lock); > + > + len = mm->arg_end - mm->arg_start; > + if (len > PAGE_SIZE) > + len = PAGE_SIZE; If arg_end or arg_start are modified after this, is it truly safe to assume that len will remain <= PAGE_SIZE without a memory barrier before the conditional? -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() 2009-10-10 2:57 ` Bryan Donlan @ 2009-10-10 6:32 ` KOSAKI Motohiro [not found] ` <2f11576a0910092332s6e0e3dcs35864e3a2164be0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-10 7:11 ` Bryan Donlan 0 siblings, 2 replies; 13+ messages in thread From: KOSAKI Motohiro @ 2009-10-10 6:32 UTC (permalink / raw) To: Bryan Donlan Cc: Andrew Morton, linux-kernel, Ulrich Drepper, linux-api, Timo Sirainen (Doh, I've forgot cc to original author. sorry. cc to timo.) Hi very interesting discussion. great. 2009/10/10 Bryan Donlan <bdonlan@gmail.com>: > On Fri, Oct 9, 2009 at 10:42 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: > >>> >> + __ __ __ __ __ __ __ __ __ __ __ __ __ __ res += access_process_vm(task, mm->env_start, >> >> Your email client is converting tabs to non-ascii crap. gmail. Sigh. > > Weird ... I'll have to see if I can do something about that :/ > >> OK. >> >> But there's no way in which the reader of either the patch or the >> resulting code can discover this subtlety. > > I didn't write the log message or the code - I just mentioned these > same issues back in the lkml thread :) But yes, this should be > mentioned somewhere. > This is documented in access_process_vm. 3224/* 3225 * Access another process' address space. 3226 * Source/target buffer must be kernel space, 3227 * Do not walk the page table directly, use get_user_pages 3228 */ 3229int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write) 3230{ 3231 struct mm_struct *mm; 3232 struct vm_area_struct *vma; 3233 void *old_buf = buf; 3234 3235 mm = get_task_mm(tsk); 3236 if (!mm) 3237 return 0; 3238 3239 down_read(&mm->mmap_sem); 3240 /* ignore errors, just check how much was successfully transferred */ 3241 while (len) { However, yes, it is pretty bad place ;) /* ignore errors, just check how much was successfully transferred */ comment should be placed in function header comment. I think separate another patch is better. >>> The solution is to use the seqlock to detect this, and prevent the >>> secret information from ever making it back to process B's userspace. >>> Note that it's not enough to just recheck arg_start, as process A may >>> reassign the proctitle area back to its original position after having >>> it somewhere else for a while. >> >> Well seqlock is _a_ solution. Another is to use a mutex or an rwsem >> around the whole operation. >> >> With the code as you propose it, what happens if a process sits in a >> tight loop running setproctitle? Do other processes running `ps' get >> stuck in a livelock until the offending process gets scheduled out? > > It does seem like a maximum spin count should be put in there - and > maybe a timeout as well (since with FUSE etc it's possible to engineer > page faults that take arbitrarily long). > Also, it occurs to me that: makes sense. I like maximum spin rather than timeout. >> + do { >> + seq = read_seqbegin(&mm->arg_lock); >> + >> + len = mm->arg_end - mm->arg_start; >> + if (len > PAGE_SIZE) >> + len = PAGE_SIZE; > > If arg_end or arg_start are modified after this, is it truly safe to > assume that len will remain <= PAGE_SIZE without a memory barrier > before the conditional? 1) access_process_vm() doesn't return error value. 2) read_seqretry(&mm->arg_lock, seq)) check seq, not mm->arg_start or len. then, if arg_{start,end} is modified, access_process_vm() may return 0 and strnlen makes bad calculation, but read_seqretry() can detect its modify rightly. I think. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <2f11576a0910092332s6e0e3dcs35864e3a2164be0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() [not found] ` <2f11576a0910092332s6e0e3dcs35864e3a2164be0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-10 6:39 ` Andrew Morton 2009-10-12 19:03 ` KOSAKI Motohiro 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2009-10-10 6:39 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Bryan Donlan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper, linux-api-u79uwXL29TY76Z2rM5mHXA, Timo Sirainen On Sat, 10 Oct 2009 15:32:35 +0900 KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote: > >>> The solution is to use the seqlock to detect this, and prevent the > >>> secret information from ever making it back to process B's userspace. > >>> Note that it's not enough to just recheck arg_start, as process A may > >>> reassign the proctitle area back to its original position after having > >>> it somewhere else for a while. > >> > >> Well seqlock is _a_ solution. __Another is to use a mutex or an rwsem > >> around the whole operation. > >> > >> With the code as you propose it, what happens if a process sits in a > >> tight loop running setproctitle? __Do other processes running `ps' get > >> stuck in a livelock until the offending process gets scheduled out? > > > > It does seem like a maximum spin count should be put in there - and > > maybe a timeout as well (since with FUSE etc it's possible to engineer > > page faults that take arbitrarily long). > > Also, it occurs to me that: > > makes sense. > I like maximum spin rather than timeout. Start simple. What's wrong with mutex_lock() on the reader and writer sides? rwsems might be OK too. In both cases we should think about whether persistent readers can block the writer excessively though. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() 2009-10-10 6:39 ` Andrew Morton @ 2009-10-12 19:03 ` KOSAKI Motohiro [not found] ` <20091013022335.C741.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: KOSAKI Motohiro @ 2009-10-12 19:03 UTC (permalink / raw) To: Andrew Morton Cc: kosaki.motohiro, Bryan Donlan, linux-kernel, Ulrich Drepper, linux-api, Timo Sirainen [-- Attachment #1: Type: text/plain, Size: 2141 bytes --] Hi > > >>> The solution is to use the seqlock to detect this, and prevent the > > >>> secret information from ever making it back to process B's userspace. > > >>> Note that it's not enough to just recheck arg_start, as process A may > > >>> reassign the proctitle area back to its original position after having > > >>> it somewhere else for a while. > > >> > > >> Well seqlock is _a_ solution. __Another is to use a mutex or an rwsem > > >> around the whole operation. > > >> > > >> With the code as you propose it, what happens if a process sits in a > > >> tight loop running setproctitle? __Do other processes running `ps' get > > >> stuck in a livelock until the offending process gets scheduled out? > > > > > > It does seem like a maximum spin count should be put in there - and > > > maybe a timeout as well (since with FUSE etc it's possible to engineer > > > page faults that take arbitrarily long). > > > Also, it occurs to me that: > > > > makes sense. > > I like maximum spin rather than timeout. > > Start simple. What's wrong with mutex_lock() on the reader and writer > sides? rwsems might be OK too. > > In both cases we should think about whether persistent readers can > block the writer excessively though. I thought your mention seems reasonable. then I mesured various locking performance. no-contention read-read contetion read-write contention w/o patch 4627 ms 7575 ms N/A mutex 5717 ms 33872 ms (!) 14793 ms rw-semaphoe 6846 ms 10734 ms 36156 ms (!) seqlock 4754 ms 7558 ms 9373 ms Umm, seqlock is significantly better than other. <testcase> readtitle.c read proctitle 1,000,000 times setproctitle.c infinite loop of setproctitle() no-contention: ./readtitle 1 read-read contention: ./readtitle 1 &; ./readtitle 1&; wait read-write contention ./setproctitle [switch other terminal] ./readtitle `pidof setproctitle` I agree this testcase is too pessimistic. ps doesn't read /proc/{pid}/cmdline so frequently. however if we need to concern DoS attack, we need to mesure pessimistic scenario. Plus, this result indicate setproctitle-seqlock doesn't need timeout nor max spin. [-- Attachment #2: setproctitle-mutex.patch --] [-- Type: application/octet-stream, Size: 3760 bytes --] diff --git a/fs/proc/base.c b/fs/proc/base.c index 837469a..3470ee3 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -255,32 +255,44 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer) int res = 0; unsigned int len; struct mm_struct *mm = get_task_mm(task); + if (!mm) goto out; + + /* The process was not constructed yet? */ if (!mm->arg_end) goto out_mm; /* Shh! No looking before we're done */ - len = mm->arg_end - mm->arg_start; - + mutex_lock(&mm->arg_lock); + len = mm->arg_end - mm->arg_start; if (len > PAGE_SIZE) len = PAGE_SIZE; - + res = access_process_vm(task, mm->arg_start, buffer, len, 0); - // If the nul at the end of args has been overwritten, then - // assume application is using setproctitle(3). - if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { + if (mm->arg_end != mm->env_start) + /* PR_SET_PROCTITLE_AREA used */ + res = strnlen(buffer, res); + else if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { + /* + * If the nul at the end of args has been overwritten, + * then assume application is using sendmail's + * SPT_REUSEARGV style argv override. + */ len = strnlen(buffer, res); if (len < res) { - res = len; + res = len; } else { len = mm->env_end - mm->env_start; if (len > PAGE_SIZE - res) len = PAGE_SIZE - res; - res += access_process_vm(task, mm->env_start, buffer+res, len, 0); + res += access_process_vm(task, mm->env_start, + buffer+res, len, 0); res = strnlen(buffer, res); } } + mutex_unlock(&mm->arg_lock); + out_mm: mmput(mm); out: diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 84a524a..3e2a346 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -12,6 +12,7 @@ #include <linux/completion.h> #include <linux/cpumask.h> #include <linux/page-debug-flags.h> +#include <linux/mutex.h> #include <asm/page.h> #include <asm/mmu.h> @@ -236,6 +237,7 @@ struct mm_struct { unsigned long stack_vm, reserved_vm, def_flags, nr_ptes; unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; + struct mutex arg_lock; unsigned long arg_start, arg_end, env_start, env_end; unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ diff --git a/include/linux/prctl.h b/include/linux/prctl.h index 9311505..e80a11b 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -90,4 +90,8 @@ #define PR_MCE_KILL 33 + +/* Set process title memory area for setproctitle() */ +#define PR_SET_PROCTITLE_AREA 34 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 266c6af..f6ebe46 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -459,6 +459,7 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p) mm->cached_hole_size = ~0UL; mm_init_aio(mm); mm_init_owner(mm, p); + mutex_init(&mm->arg_lock); if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; diff --git a/kernel/sys.c b/kernel/sys.c index 255475d..e401e54 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1564,6 +1564,29 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = 0; break; + case PR_SET_PROCTITLE_AREA: { + struct mm_struct *mm = current->mm; + unsigned long addr = arg2; + unsigned long len = arg3; + unsigned long end = arg2 + arg3; + + if (len > PAGE_SIZE) + return -EINVAL; + + if (addr >= end) + return -EINVAL; + + if (!access_ok(VERIFY_READ, addr, len)) { + return -EFAULT; + } + + mutex_lock(&mm->arg_lock); + mm->arg_start = addr; + mm->arg_end = addr + len; + mutex_unlock(&mm->arg_lock); + + return 0; + } default: error = -EINVAL; break; [-- Attachment #3: setproctitle-rwmutex.patch --] [-- Type: application/octet-stream, Size: 3378 bytes --] diff --git a/fs/proc/base.c b/fs/proc/base.c index 837469a..e95878a 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -255,21 +255,30 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer) int res = 0; unsigned int len; struct mm_struct *mm = get_task_mm(task); + if (!mm) goto out; + + /* The process was not constructed yet? */ if (!mm->arg_end) goto out_mm; /* Shh! No looking before we're done */ - len = mm->arg_end - mm->arg_start; - + down_read(&mm->arg_sem); + len = mm->arg_end - mm->arg_start; if (len > PAGE_SIZE) len = PAGE_SIZE; - + res = access_process_vm(task, mm->arg_start, buffer, len, 0); - // If the nul at the end of args has been overwritten, then - // assume application is using setproctitle(3). - if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { + if (mm->arg_end != mm->env_start) + /* PR_SET_PROCTITLE_AREA used */ + res = strnlen(buffer, res); + else if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { + /* + * If the nul at the end of args has been overwritten, + * then assume application is using sendmail's + * SPT_REUSEARGV style argv override. + */ len = strnlen(buffer, res); if (len < res) { res = len; @@ -281,6 +290,8 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer) res = strnlen(buffer, res); } } + up_read(&mm->arg_sem); + out_mm: mmput(mm); out: diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 84a524a..aff2e8d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -236,6 +236,7 @@ struct mm_struct { unsigned long stack_vm, reserved_vm, def_flags, nr_ptes; unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; + struct rw_semaphore arg_sem; unsigned long arg_start, arg_end, env_start, env_end; unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ diff --git a/include/linux/prctl.h b/include/linux/prctl.h index 9311505..e80a11b 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -90,4 +90,8 @@ #define PR_MCE_KILL 33 + +/* Set process title memory area for setproctitle() */ +#define PR_SET_PROCTITLE_AREA 34 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 266c6af..014999c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -459,6 +459,7 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p) mm->cached_hole_size = ~0UL; mm_init_aio(mm); mm_init_owner(mm, p); + init_rwsem(&mm->arg_sem); if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; diff --git a/kernel/sys.c b/kernel/sys.c index 255475d..cd69a06 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1564,6 +1564,29 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = 0; break; + case PR_SET_PROCTITLE_AREA: { + struct mm_struct *mm = current->mm; + unsigned long addr = arg2; + unsigned long len = arg3; + unsigned long end = arg2 + arg3; + + if (len > PAGE_SIZE) + return -EINVAL; + + if (addr >= end) + return -EINVAL; + + if (!access_ok(VERIFY_READ, addr, len)) { + return -EFAULT; + } + + down_write(&mm->arg_sem); + mm->arg_start = addr; + mm->arg_end = addr + len; + up_write(&mm->arg_sem); + + return 0; + } default: error = -EINVAL; break; [-- Attachment #4: setproctitle-seqlock.patch --] [-- Type: application/octet-stream, Size: 4138 bytes --] diff --git a/fs/proc/base.c b/fs/proc/base.c index 837469a..5ac6ece 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -255,32 +255,47 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer) int res = 0; unsigned int len; struct mm_struct *mm = get_task_mm(task); + unsigned seq; + if (!mm) goto out; + + /* The process was not constructed yet? */ if (!mm->arg_end) - goto out_mm; /* Shh! No looking before we're done */ + goto out_mm; - len = mm->arg_end - mm->arg_start; - - if (len > PAGE_SIZE) - len = PAGE_SIZE; - - res = access_process_vm(task, mm->arg_start, buffer, len, 0); - - // If the nul at the end of args has been overwritten, then - // assume application is using setproctitle(3). - if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { - len = strnlen(buffer, res); - if (len < res) { - res = len; - } else { - len = mm->env_end - mm->env_start; - if (len > PAGE_SIZE - res) - len = PAGE_SIZE - res; - res += access_process_vm(task, mm->env_start, buffer+res, len, 0); + do { + seq = read_seqbegin(&mm->arg_lock); + + len = mm->arg_end - mm->arg_start; + if (len > PAGE_SIZE) + len = PAGE_SIZE; + + res = access_process_vm(task, mm->arg_start, buffer, len, 0); + + if (mm->arg_end != mm->env_start) + /* PR_SET_PROCTITLE_AREA used */ res = strnlen(buffer, res); + else if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) { + /* + * If the nul at the end of args has been overwritten, + * then assume application is using sendmail's + * SPT_REUSEARGV style argv override. + */ + len = strnlen(buffer, res); + if (len < res) { + res = len; + } else { + len = mm->env_end - mm->env_start; + if (len > PAGE_SIZE - res) + len = PAGE_SIZE - res; + res += access_process_vm(task, mm->env_start, + buffer+res, len, 0); + res = strnlen(buffer, res); + } } - } + } while (read_seqretry(&mm->arg_lock, seq)); + out_mm: mmput(mm); out: diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 84a524a..279d620 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -12,6 +12,7 @@ #include <linux/completion.h> #include <linux/cpumask.h> #include <linux/page-debug-flags.h> +#include <linux/seqlock.h> #include <asm/page.h> #include <asm/mmu.h> @@ -236,6 +237,7 @@ struct mm_struct { unsigned long stack_vm, reserved_vm, def_flags, nr_ptes; unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; + seqlock_t arg_lock; unsigned long arg_start, arg_end, env_start, env_end; unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ diff --git a/include/linux/prctl.h b/include/linux/prctl.h index 9311505..e80a11b 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -90,4 +90,8 @@ #define PR_MCE_KILL 33 + +/* Set process title memory area for setproctitle() */ +#define PR_SET_PROCTITLE_AREA 34 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 266c6af..13089e3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -459,6 +459,7 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p) mm->cached_hole_size = ~0UL; mm_init_aio(mm); mm_init_owner(mm, p); + seqlock_init(&mm->arg_lock); if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; diff --git a/kernel/sys.c b/kernel/sys.c index 255475d..83724d2 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1563,7 +1563,29 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, } error = 0; break; + case PR_SET_PROCTITLE_AREA: { + struct mm_struct *mm = current->mm; + unsigned long addr = arg2; + unsigned long len = arg3; + unsigned long end = arg2 + arg3; + if (len > PAGE_SIZE) + return -EINVAL; + + if (addr >= end) + return -EINVAL; + + if (!access_ok(VERIFY_READ, addr, len)) { + return -EFAULT; + } + + write_seqlock(&mm->arg_lock); + mm->arg_start = addr; + mm->arg_end = addr + len; + write_sequnlock(&mm->arg_lock); + + return 0; + } default: error = -EINVAL; break; [-- Attachment #5: readtitle.c --] [-- Type: application/octet-stream, Size: 672 bytes --] #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/time.h> #define LOOP (1000000) int main(int argc, char** argv) { char path[128]; char buf[1024]; FILE* file; int i; struct timeval start, end; long long elaps; sprintf(path, "/proc/%s/cmdline", argv[1]); gettimeofday(&start, NULL); file = fopen(path, "r"); for (i=0; i<LOOP; i++) { fread(buf, 1024, 1, file); } gettimeofday(&end, NULL); printf("%s\n", buf); elaps = end.tv_sec * 1000 + end.tv_usec /1000; elaps -= start.tv_sec * 1000; elaps -= start.tv_usec / 1000; printf("time = %lld ms\n", elaps); return 0; } [-- Attachment #6: setproctitle.c --] [-- Type: application/octet-stream, Size: 571 bytes --] #include <string.h> #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <sys/prctl.h> #define ERR(str) (perror(str), exit(1)) void settitle(char* title){ int err; err = prctl(34, title, strlen(title)+1); if (err < 0) ERR("prctl "); } int main(void){ long i; char buf[1024]; for (i = 0; i < 10000000000LL; i++){ sprintf(buf, "loooooooooooooooooooooooong string %ld",i); settitle(buf); } return 0; } ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <20091013022335.C741.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() [not found] ` <20091013022335.C741.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-10-12 19:22 ` Andrew Morton [not found] ` <20091012122246.a941013b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2009-10-12 19:22 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Bryan Donlan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper, linux-api-u79uwXL29TY76Z2rM5mHXA, Timo Sirainen On Tue, 13 Oct 2009 04:03:45 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote: > > Start simple. What's wrong with mutex_lock() on the reader and writer > > sides? rwsems might be OK too. > > > > In both cases we should think about whether persistent readers can > > block the writer excessively though. > > I thought your mention seems reasonable. then I mesured various locking > performance. > > no-contention read-read contetion read-write contention > w/o patch 4627 ms 7575 ms N/A > mutex 5717 ms 33872 ms (!) 14793 ms > rw-semaphoe 6846 ms 10734 ms 36156 ms (!) > seqlock 4754 ms 7558 ms 9373 ms > > Umm, seqlock is significantly better than other. Sure, but even the worst case there is 1,000,000 operations in 34 seconds (yes?). 33 microseconds for a /proc read while under a specific local DoS attack is OK! If so then all implementations are acceptable and we should choose the simplest, most-obviously-correct one. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20091012122246.a941013b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>]
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() [not found] ` <20091012122246.a941013b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> @ 2009-10-13 0:03 ` KOSAKI Motohiro 0 siblings, 0 replies; 13+ messages in thread From: KOSAKI Motohiro @ 2009-10-13 0:03 UTC (permalink / raw) To: Andrew Morton Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Bryan Donlan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper, linux-api-u79uwXL29TY76Z2rM5mHXA, Timo Sirainen > On Tue, 13 Oct 2009 04:03:45 +0900 (JST) > KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote: > > > > Start simple. What's wrong with mutex_lock() on the reader and writer > > > sides? rwsems might be OK too. > > > > > > In both cases we should think about whether persistent readers can > > > block the writer excessively though. > > > > I thought your mention seems reasonable. then I mesured various locking > > performance. > > > > no-contention read-read contetion read-write contention > > w/o patch 4627 ms 7575 ms N/A > > mutex 5717 ms 33872 ms (!) 14793 ms > > rw-semaphoe 6846 ms 10734 ms 36156 ms (!) > > seqlock 4754 ms 7558 ms 9373 ms > > > > Umm, seqlock is significantly better than other. > > Sure, but even the worst case there is 1,000,000 operations in 34 > seconds (yes?). 33 microseconds for a /proc read while under a specific > local DoS attack is OK! > > If so then all implementations are acceptable and we should choose the > simplest, most-obviously-correct one. Hm, ok! I had guessed you don't accept this slowness. but my guess was wrong. I have no objection to use rw-semaphoe if you accept it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() 2009-10-10 6:32 ` KOSAKI Motohiro [not found] ` <2f11576a0910092332s6e0e3dcs35864e3a2164be0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-10 7:11 ` Bryan Donlan [not found] ` <3e8340490910100011u17497293o613334c64f1543c8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 13+ messages in thread From: Bryan Donlan @ 2009-10-10 7:11 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, linux-kernel, Ulrich Drepper, linux-api, Timo Sirainen On Sat, Oct 10, 2009 at 2:32 AM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> It does seem like a maximum spin count should be put in there - and >> maybe a timeout as well (since with FUSE etc it's possible to engineer >> page faults that take arbitrarily long). >> Also, it occurs to me that: > > makes sense. > I like maximum spin rather than timeout. I'm worried about the scenario where process A sets its cmdline buffer to point to a page which will take a _VERY_ long time to pagein (maybe forever), and then process B goes to try to read its cmdline. What happens now? Process A can arrange for this to happen by using a FUSE filesystem that sits on a read forever. And since the first thing the admin's likely to do to track down the problem is 'ps awux', this is liable to be a rather nasty DoS... Of course, this is no worse than it is now - it's already possible to replace the page in question. But we should think about ways this could be fixed for good... > >>> + do { >>> + seq = read_seqbegin(&mm->arg_lock); >>> + >>> + len = mm->arg_end - mm->arg_start; >>> + if (len > PAGE_SIZE) >>> + len = PAGE_SIZE; >> >> If arg_end or arg_start are modified after this, is it truly safe to >> assume that len will remain <= PAGE_SIZE without a memory barrier >> before the conditional? > > 1) access_process_vm() doesn't return error value. > 2) read_seqretry(&mm->arg_lock, seq)) check seq, not mm->arg_start or len. > > then, if arg_{start,end} is modified, access_process_vm() may return 0 > and strnlen > makes bad calculation, but read_seqretry() can detect its modify > rightly. I think. No, I'm worried about what if the compiler decides to rewrite like so: if (mm->arg_end - mm->arg_start > PAGE_SIZE) len = PAGE_SIZE; else /* here we reload arg_end/arg_start! */ len = mm->arg_end - mm->arg_start; Now we might write into buffer more than PAGE_SIZE bytes, which is probably a buffer overrun into kernel space... ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <3e8340490910100011u17497293o613334c64f1543c8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() [not found] ` <3e8340490910100011u17497293o613334c64f1543c8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-12 19:03 ` KOSAKI Motohiro [not found] ` <20091013031853.C744.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: KOSAKI Motohiro @ 2009-10-12 19:03 UTC (permalink / raw) To: Bryan Donlan Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper, linux-api-u79uwXL29TY76Z2rM5mHXA, Timo Sirainen Hi Sorry for the delaying. > On Sat, Oct 10, 2009 at 2:32 AM, KOSAKI Motohiro > <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote: > > >> It does seem like a maximum spin count should be put in there - and > >> maybe a timeout as well (since with FUSE etc it's possible to engineer > >> page faults that take arbitrarily long). > >> Also, it occurs to me that: > > > > makes sense. > > I like maximum spin rather than timeout. > > I'm worried about the scenario where process A sets its cmdline buffer > to point to a page which will take a _VERY_ long time to pagein (maybe > forever), and then process B goes to try to read its cmdline. What > happens now? Honestly, I don't worry about so much. if attacker want DoS attack, fork bomb is efficient than this way. then, attacker never use this. > Process A can arrange for this to happen by using a FUSE filesystem > that sits on a read forever. And since the first thing the admin's > likely to do to track down the problem is 'ps awux', this is liable to > be a rather nasty DoS... Probably, I haven't understand this paragraph. Why is this FUSE related issue? > Of course, this is no worse than it is now - it's already possible to > replace the page in question. But we should think about ways this > could be fixed for good... Plus, please look my mesurement data as another post. seqlock implementation is very fast although contention occured. > >>> + do { > >>> + seq = read_seqbegin(&mm->arg_lock); > >>> + > >>> + len = mm->arg_end - mm->arg_start; > >>> + if (len > PAGE_SIZE) > >>> + len = PAGE_SIZE; > >> > >> If arg_end or arg_start are modified after this, is it truly safe to > >> assume that len will remain <= PAGE_SIZE without a memory barrier > >> before the conditional? > > > > 1) access_process_vm() doesn't return error value. > > 2) read_seqretry(&mm->arg_lock, seq)) check seq, not mm->arg_start or len. > > > > then, if arg_{start,end} is modified, access_process_vm() may return 0 > > and strnlen > > makes bad calculation, but read_seqretry() can detect its modify > > rightly. I think. > > No, I'm worried about what if the compiler decides to rewrite like so: > if (mm->arg_end - mm->arg_start > PAGE_SIZE) > len = PAGE_SIZE; > else /* here we reload arg_end/arg_start! */ > len = mm->arg_end - mm->arg_start; > > Now we might write into buffer more than PAGE_SIZE bytes, which is > probably a buffer overrun into kernel space... Rgiht. I'll fix this issue at next spin. Thank you. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20091013031853.C744.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() [not found] ` <20091013031853.C744.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-10-12 19:33 ` Bryan Donlan 0 siblings, 0 replies; 13+ messages in thread From: Bryan Donlan @ 2009-10-12 19:33 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper, linux-api-u79uwXL29TY76Z2rM5mHXA, Timo Sirainen On Mon, Oct 12, 2009 at 3:03 PM, KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote: > Hi > > Sorry for the delaying. > >> On Sat, Oct 10, 2009 at 2:32 AM, KOSAKI Motohiro >> <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote: >> >> >> It does seem like a maximum spin count should be put in there - and >> >> maybe a timeout as well (since with FUSE etc it's possible to engineer >> >> page faults that take arbitrarily long). >> >> Also, it occurs to me that: >> > >> > makes sense. >> > I like maximum spin rather than timeout. >> >> I'm worried about the scenario where process A sets its cmdline buffer >> to point to a page which will take a _VERY_ long time to pagein (maybe >> forever), and then process B goes to try to read its cmdline. What >> happens now? > > Honestly, I don't worry about so much. if attacker want DoS attack, fork bomb is > efficient than this way. then, attacker never use this. Fork bombs and etc can be mitigated by resource limits; but if the command line is placed on a page that will take a very long time to fault, then that cannot be mitigated... But again, this DoS already exists and isn't any easier with this patch, so I think it's a separate issue. >> Process A can arrange for this to happen by using a FUSE filesystem >> that sits on a read forever. And since the first thing the admin's >> likely to do to track down the problem is 'ps awux', this is liable to >> be a rather nasty DoS... > > Probably, I haven't understand this paragraph. Why is this FUSE related issue? Just an example of how one can create a page that will take a very long time to fault in. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-10-13 0:03 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-09 4:50 [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl() KOSAKI Motohiro 2009-10-10 0:13 ` Andrew Morton 2009-10-10 2:22 ` Bryan Donlan [not found] ` <3e8340490910091922g7891b31al649e91f15ffae687-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-10 2:42 ` Andrew Morton [not found] ` <20091009194250.eb76e338.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 2009-10-10 2:57 ` Bryan Donlan 2009-10-10 6:32 ` KOSAKI Motohiro [not found] ` <2f11576a0910092332s6e0e3dcs35864e3a2164be0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-10 6:39 ` Andrew Morton 2009-10-12 19:03 ` KOSAKI Motohiro [not found] ` <20091013022335.C741.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-10-12 19:22 ` Andrew Morton [not found] ` <20091012122246.a941013b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 2009-10-13 0:03 ` KOSAKI Motohiro 2009-10-10 7:11 ` Bryan Donlan [not found] ` <3e8340490910100011u17497293o613334c64f1543c8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-12 19:03 ` KOSAKI Motohiro [not found] ` <20091013031853.C744.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-10-12 19:33 ` Bryan Donlan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).