public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiri Slaby <jirislaby@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Neil Horman <nhorman@tuxdriver.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Resource limits interface proposal [was: pull request for writable limits]
Date: Wed, 05 May 2010 14:12:54 +0200	[thread overview]
Message-ID: <4BE160C6.90404@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1003211128520.18017@i5.linux-foundation.org>

Hi.

On 03/21/2010 07:38 PM, Linus Torvalds wrote:
> Or even just _one_ system call that takes two pointers, and can do an 
> atomic replace-and-return-the-old-value, like 'sigaction()' does, ie 
> something like
> 
> 	int prlimit64(pid, limit, const struct rlimit64 *new, struct rlimit64 *old);
> 
> wouldn't that be a nice generic interface?

So I ended up with thinking about these possibilities:

1) internal representation of limits will stay as is in signal_struct,
i.e. long limits with infinity being ~0ul. This is the least intrusive
solution. The new prlimit64 will convert rlimit64 to rlimit and pass
down to do_prlimit. With setrlimit and getrilimit just as wrappers it
will look like:
prlimit64(pid, resource, new64, old64) ->
    new = convert_to_rlim(new64)
    tsk = find_task(pid)
    do_prlimit(tsk, resource, new, old)
    old64 = convert_to_rlim64(old)
setrlimit(resource, rlim) ->
    do_prlimit(current, resource, rlim, NULL)
getrlimit(resource, rlim) ->
    do_prlimit(current, resource, NULL, rlim)
with appropriate copy_{from,to}_user. (And setrlimit+getrlimit will be
scheduled for removal with all the compat crap around them.)

It may also be that rlimit64 will contain flags like:
#define RLIM64_CUR_INFINITY     0x00000001
#define RLIM64_MAX_INFINITY     0x00000002
struct rlimit64 {
        __u64 rlim_cur;
        __u64 rlim_max;
        __u32 flags;
};
if I understood Alexey correctly to separate limits values from
infinity? flags then will be converted to ~0ul when converting from
rlimit64 to rlimit above too.

The drawback is when a 32-bit user passes down a value >= (1 << 32),
EINVAL shall occur.

The pros are, no locking, no magic, longs are naturally atomic. Still
with arch-independent parameter for sys_prlimit64.


2) Introduce an rlimit lock and move every user to the rlimit helpers
which appropriately lock the accesses. And making locking a nop when
BITS_PER_LONG == 64. Then we can have rlimit64 in signal_struct and
everything will happen on 64-bit limit values.

If we decide to separate infinity from value with the flags above, we
should also reconsider what infinity will be. Much code just counts with
rlimit.rlim_{cur,max} being the highest possible value and doesn't count
with something like rlimit64.flags. This will result in locks not-being
a nop on 64-bit, because we want fresh rlim_cur+flags and rlim_max+flags
pairs. We could also have the flags solely in the syscall interface and
~0ULL count as infty internally.

In this case the situation will be
prlimit64(pid, resource, new64, old64) ->
    tsk = find_task(pid)
    do_prlimit(tsk, resource, new64, old64)
setrlimit(resource, rlim) ->
    rlim64 = convert_to_rlim64(rlim)
    do_prlimit(current, resource, rlim64, NULL)
getrlimit(resource, rlim) ->
    do_prlimit(current, resource, NULL, rlim64)
    rlim = convert_to_rlim(rlim64)

We cannot fail in prlimit64 due to limited space in longs on 32-bit,
however we added locking which may slow things down. I have no idea how
contended the lock will be, but as rlimits are used in the scheduler and
filesystem core, it might affect performance. I might measure if this is
of interest.


3) [inspired by Jan Kara's idea who knows how inode handling works] It's
some kind of similar to 2), we just avoid locks similarly to
inode->i_size accessors.

It doesn't solve the case of separate flags though.



Just a side note, we cannot use the rlimit64 name which is already
reserved in glibc headers for limits handling.

I will appreciate any comments.

thanks,
-- 
js

-- 
js
suse labs

-- 
js

  parent reply	other threads:[~2010-05-05 12:13 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-07 16:52 [PULL] pull request for writable limits for 2.6.33-rc1 Jiri Slaby
2009-12-09 19:25 ` [PULL] pull request for writable limits for 2.6.33-rc0 Jiri Slaby
2009-12-11 11:05   ` [git pull -resend] " Jiri Slaby
2009-12-23  9:40     ` Jiri Slaby
2010-01-02 21:40   ` [PULL] " Jiri Kosina
2010-01-02 21:52     ` Ingo Molnar
2010-01-04 21:59       ` Jiri Kosina
2010-01-04 10:47   ` [PULL] pull request for limits FIXES for 2.6.33-rc Jiri Slaby
2010-01-04 10:48     ` [PATCH 1/3] SECURITY: selinux, fix update_rlimit_cpu parameter Jiri Slaby
2010-01-05 15:50       ` David Howells
2010-01-04 10:48     ` [PATCH 2/3] resource: move kernel function inside __KERNEL__ Jiri Slaby
2010-01-04 10:48     ` [PATCH 3/3] resource: add helpers for fetching rlimits Jiri Slaby
2010-03-05 16:53 ` [git pull] pull request for writable limits for 2.6.34-rc0 Jiri Slaby
2010-03-20 19:20   ` Linus Torvalds
2010-03-21  1:45     ` Neil Horman
2010-03-21  6:06     ` Alexey Dobriyan
2010-03-21 18:38       ` Linus Torvalds
2010-03-24 17:02         ` Jiri Slaby
2010-04-14  9:31           ` Jiri Slaby
2010-05-05 12:12         ` Jiri Slaby [this message]
2010-05-05 15:08           ` Resource limits interface proposal [was: pull request for writable limits] Linus Torvalds
2010-05-06  6:39             ` Alexey Dobriyan
2010-05-06 15:37               ` Linus Torvalds
2010-05-07  8:55                 ` [PATCH 01/11] rlimits: security, add task_struct to setrlimit Jiri Slaby
2010-05-07  8:55                 ` [PATCH 02/11] rlimits: add task_struct to update_rlimit_cpu Jiri Slaby
2010-05-07  8:55                 ` [PATCH 03/11] rlimits: make sure ->rlim_max never grows in sys_setrlimit Jiri Slaby
2010-05-07  8:55                 ` [PATCH 04/11] rlimits: split sys_setrlimit Jiri Slaby
2010-05-07  8:55                 ` [PATCH 05/11] rlimits: allow setrlimit to non-current tasks Jiri Slaby
2010-05-07  8:55                 ` [PATCH 06/11] rlimits: do security check under task_lock Jiri Slaby
2010-05-07  8:55                 ` [PATCH 07/11] rlimits: add rlimit64 structure Jiri Slaby
2010-05-07  8:55                 ` [PATCH 08/11] rlimits: redo do_setrlimit to more generic do_prlimit Jiri Slaby
2010-05-07  8:55                 ` [PATCH 09/11] rlimits: switch getrlimit to do_prlimit Jiri Slaby
2010-05-07  9:02                   ` [PATCH v2 09/11] rlimits: switch more rlimit syscalls " Jiri Slaby
2010-05-07  9:05                     ` Jiri Slaby
2010-05-07  8:55                 ` [PATCH " Jiri Slaby
2010-05-07  8:55                 ` [PATCH 10/11] rlimits: implement prlimit64 syscall Jiri Slaby
2010-05-07  8:55                 ` [PATCH 11/11] unistd: add __NR_prlimit64 syscall numbers Jiri Slaby
2010-05-06 15:46             ` Resource limits interface proposal [was: pull request for writable limits] Jiri Slaby
2010-03-24 17:04     ` [git pull] pull request for writable limits for 2.6.34-rc0 Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BE160C6.90404@gmail.com \
    --to=jirislaby@gmail.com \
    --cc=adobriyan@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox