From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934254Ab0EEMNE (ORCPT ); Wed, 5 May 2010 08:13:04 -0400 Received: from mail-bw0-f225.google.com ([209.85.218.225]:45085 "EHLO mail-bw0-f225.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934114Ab0EEMM7 (ORCPT ); Wed, 5 May 2010 08:12:59 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=S1wVy+Gccd08Onaqwrr9BpdrUkhC2O0bpPNjnl71PxjlhPbztL52gCtu784X/96gf5 zGvqIqUcFfr6vpGHWUrxthOBrU2kDGjVgHY6+4E9t1jM2fI1HC5/Uho3NL85E3cB1rUg qQndnLtqJBrZ3zlQKpd+gZw3XFP16StlwS6ck= Message-ID: <4BE160C6.90404@gmail.com> Date: Wed, 05 May 2010 14:12:54 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.2.5pre) Gecko/20100430 SUSE/3.1b2-7.1 Thunderbird/3.1b2 MIME-Version: 1.0 To: Linus Torvalds CC: Alexey Dobriyan , LKML , Neil Horman , Oleg Nesterov Subject: Resource limits interface proposal [was: pull request for writable limits] References: <4B1D32D1.4090404@gmail.com> <4B9136F4.4010007@gmail.com> <20100321060607.GA4062@x200> In-Reply-To: X-Enigmail-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi. On 03/21/2010 07:38 PM, Linus Torvalds wrote: > Or even just _one_ system call that takes two pointers, and can do an > atomic replace-and-return-the-old-value, like 'sigaction()' does, ie > something like > > int prlimit64(pid, limit, const struct rlimit64 *new, struct rlimit64 *old); > > wouldn't that be a nice generic interface? So I ended up with thinking about these possibilities: 1) internal representation of limits will stay as is in signal_struct, i.e. long limits with infinity being ~0ul. This is the least intrusive solution. The new prlimit64 will convert rlimit64 to rlimit and pass down to do_prlimit. With setrlimit and getrilimit just as wrappers it will look like: prlimit64(pid, resource, new64, old64) -> new = convert_to_rlim(new64) tsk = find_task(pid) do_prlimit(tsk, resource, new, old) old64 = convert_to_rlim64(old) setrlimit(resource, rlim) -> do_prlimit(current, resource, rlim, NULL) getrlimit(resource, rlim) -> do_prlimit(current, resource, NULL, rlim) with appropriate copy_{from,to}_user. (And setrlimit+getrlimit will be scheduled for removal with all the compat crap around them.) It may also be that rlimit64 will contain flags like: #define RLIM64_CUR_INFINITY 0x00000001 #define RLIM64_MAX_INFINITY 0x00000002 struct rlimit64 { __u64 rlim_cur; __u64 rlim_max; __u32 flags; }; if I understood Alexey correctly to separate limits values from infinity? flags then will be converted to ~0ul when converting from rlimit64 to rlimit above too. The drawback is when a 32-bit user passes down a value >= (1 << 32), EINVAL shall occur. The pros are, no locking, no magic, longs are naturally atomic. Still with arch-independent parameter for sys_prlimit64. 2) Introduce an rlimit lock and move every user to the rlimit helpers which appropriately lock the accesses. And making locking a nop when BITS_PER_LONG == 64. Then we can have rlimit64 in signal_struct and everything will happen on 64-bit limit values. If we decide to separate infinity from value with the flags above, we should also reconsider what infinity will be. Much code just counts with rlimit.rlim_{cur,max} being the highest possible value and doesn't count with something like rlimit64.flags. This will result in locks not-being a nop on 64-bit, because we want fresh rlim_cur+flags and rlim_max+flags pairs. We could also have the flags solely in the syscall interface and ~0ULL count as infty internally. In this case the situation will be prlimit64(pid, resource, new64, old64) -> tsk = find_task(pid) do_prlimit(tsk, resource, new64, old64) setrlimit(resource, rlim) -> rlim64 = convert_to_rlim64(rlim) do_prlimit(current, resource, rlim64, NULL) getrlimit(resource, rlim) -> do_prlimit(current, resource, NULL, rlim64) rlim = convert_to_rlim(rlim64) We cannot fail in prlimit64 due to limited space in longs on 32-bit, however we added locking which may slow things down. I have no idea how contended the lock will be, but as rlimits are used in the scheduler and filesystem core, it might affect performance. I might measure if this is of interest. 3) [inspired by Jan Kara's idea who knows how inode handling works] It's some kind of similar to 2), we just avoid locks similarly to inode->i_size accessors. It doesn't solve the case of separate flags though. Just a side note, we cannot use the rlimit64 name which is already reserved in glibc headers for limits handling. I will appreciate any comments. thanks, -- js -- js suse labs -- js