linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* akpm@osdl.org,mtk.manpages@gmail.com
@ 2008-07-28 15:21 Narendra Prasad Madanapalli
  2008-07-28 15:30 ` 64-bit rlimits Alexey Dobriyan
  0 siblings, 1 reply; 3+ messages in thread
From: Narendra Prasad Madanapalli @ 2008-07-28 15:21 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3573 bytes --]

Problem Description:
    The following issue affects the setrlimit() and getrlimit() system
calls on Linux 2.6.13 (and earlier) on x86.
    The Problem is filed at kernel.org bug 5042
(http://bugzilla.kernel.org/show_bug.cgi?id=5042)

    With setrlimit()/getrlimit(), resource limits can not be set >
2^32-1 on x86  as internally, resource limits are represented in the
'rlimit' structure (defined in include/linux/resource.h) as  unsigned
longs, meaning 32 bits on x86. The most pertinent limit here is
RLIMIT_FSIZE, which specifies the maximum size to which a file can
grow: to be useful, this limit must be represented using a type that
is as wide as the type used to represent file offsets, i.e., as wide
as a 64-bit off_t.

    Current versions of glibc (e.g., 2.3.5) deal with this situation
somewhat strangely: if a program compiled with
    _FILE_OFFSET_BITS set to 64 (i.e., off_t is thus 'long long' -- 64
bits) tries to set a resource limit to a value
    larger than can be represented in a 32-bit unsigned long, then the
glibc wrapper for setrlimit() silently converts
    the limit value to RLIM_INFINITY.
    In other words, the requested resource limit setting is silently
ignored. (One could argue that perhaps the glibc
    wrapper should give an error, rather than silently turning a very
large limit into infinity; however, the glibc
    developers instead seem to have decided on the current behaviour
as a means of dealing with what is fundamentally a
     kernel problem.)

    (NOTE: This problem is not merely a theoretical one facing
programmers developing new applications. Since many x86
    distributions compile all (file) utilities with
-D_FILE_OFFSET_BITS=64, this issue can bite end-users as well, if
    they expect to be able to set resource limits greater than 2^32-1.)

    The solution to this problem would require new setrlimit64() and
getrlimit64() system calls on x86,
    and the existing 32-bit system calls would need to be retained so
that existing binaries would still run.

Design Approach:
    Add two system calls sys_setrlimit64()/sys_getrlimit64().
    And a type 'struct rlimit64' to accomodate limits more <= 2^64-1

Implementation Details:
    Inclutions: struct rlimit64, struct rlimit64
rlim64[RLIM64_NRLIMITS] to task_struct

Test Results:
    Test results are posted as Comment#6 to
http://bugzilla.kernel.org/show_bug.cgi?id=5042
    Issues Facing:
        Though the limits of 'rlim64 of task_struct' are initialized
to RLIM64_INFINITY in linux/init_task.h, garbage values are set to
them. Placed some printks in sys_getrlimit64 to print the values of
'rlim64 of task_struct'; the printk statements will be execute when
getrlimit64() is invoked.

the output of dmesg is as follows:
[  111.221402] resource = 1, RLIM64_INFINITY = ffffffffffffffff,
RLIMIT_FSIZE = 1, RLIM64_NLIMITS = 2
[  111.221411] current rlim64            :  max64 = f4967194, cur64 =
f496719400000001
[  111.221416] value (local var, before) :  max64 = c02f9730b7f4cce0,
cur64 =b7f18ff4f4ae5e00
[  111.221421] value (after assignment)  : max64 = f4967194, cur64 =
f496719400000001
[  118.437395] resource = 1, RLIM64_INFINITY = ffffffffffffffff,
RLIMIT_FSIZE = 1, RLIM64_NLIMITS = 2
[  118.437406] current rlim64            :  max64 = f499ff98f499ff98,
cur64 = fe86a13df498a628
[  118.437411] value (local var, before) :  max64 = c02f9730b7f94ce0,
cur64 = b7f60ff4f4b41e00
[  118.437419] value (after assignment)  : max64 = f499ff98f499ff98,
cur64 = fe86a13df498a628


    Signed-off-by: Narendra Prasad Madanapalli <narendramind@gmail.com>

[-- Attachment #2: patch-2.6.26-rlim64 --]
[-- Type: application/octet-stream, Size: 10744 bytes --]

 arch/x86/kernel/syscall_table_32.S |    2 
 include/asm-generic/resource.h     |   11 +++
 include/asm-x86/unistd_32.h        |    2 
 include/linux/init_task.h          |    1 
 include/linux/resource.h           |    5 +
 include/linux/sched.h              |    1 
 kernel/ChangeLog                   |   10 ++
 kernel/sys.c                       |  126 +++++++++++++++++++++++++++++++++++++
 8 files changed, 158 insertions(+)

diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/arch/x86/kernel/syscall_table_32.S linux-2.6.26-rlim64/arch/x86/kernel/syscall_table_32.S
--- linux-2.6.26-base/arch/x86/kernel/syscall_table_32.S	2008-07-14 03:21:29.000000000 +0530
+++ linux-2.6.26-rlim64/arch/x86/kernel/syscall_table_32.S	2008-07-20 21:06:22.000000000 +0530
@@ -326,3 +326,5 @@ ENTRY(sys_call_table)
 	.long sys_fallocate
 	.long sys_timerfd_settime	/* 325 */
 	.long sys_timerfd_gettime
+        .long sys_setrlimit64
+        .long sys_getrlimit64
diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/include/asm-generic/resource.h linux-2.6.26-rlim64/include/asm-generic/resource.h
--- linux-2.6.26-base/include/asm-generic/resource.h	2008-07-14 03:21:29.000000000 +0530
+++ linux-2.6.26-rlim64/include/asm-generic/resource.h	2008-07-20 21:10:48.000000000 +0530
@@ -46,6 +46,7 @@
 #define RLIMIT_RTPRIO		14	/* maximum realtime priority */
 #define RLIMIT_RTTIME		15	/* timeout for RT tasks in us */
 #define RLIM_NLIMITS		16
+#define RLIM64_NLIMITS          2
 
 /*
  * SuS says limits have to be unsigned.
@@ -57,6 +58,10 @@
 # define RLIM_INFINITY		(~0UL)
 #endif
 
+#ifndef RLIM64_INFINITY
+# define RLIM64_INFINITY        (~0ULL)
+#endif
+
 /*
  * RLIMIT_STACK default maximum - some architectures override it:
  */
@@ -89,6 +94,12 @@
 	[RLIMIT_RTTIME]		= {  RLIM_INFINITY,  RLIM_INFINITY },	\
 }
 
+#define INIT_RLIMITS64                                                    \
+{                                                                         \
+        [0]                     = {  RLIM64_INFINITY,  RLIM64_INFINITY }, \
+        [RLIMIT_FSIZE]          = {  RLIM64_INFINITY,  RLIM64_INFINITY }, \
+}
+
 #endif	/* __KERNEL__ */
 
 #endif
diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/include/asm-x86/unistd_32.h linux-2.6.26-rlim64/include/asm-x86/unistd_32.h
--- linux-2.6.26-base/include/asm-x86/unistd_32.h	2008-07-14 03:21:29.000000000 +0530
+++ linux-2.6.26-rlim64/include/asm-x86/unistd_32.h	2008-07-20 21:12:41.000000000 +0530
@@ -332,6 +332,8 @@
 #define __NR_fallocate		324
 #define __NR_timerfd_settime	325
 #define __NR_timerfd_gettime	326
+#define __NR_setrlimit64        327
+#define __NR_getrlimit64        328
 
 #ifdef __KERNEL__
 
diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/include/linux/init_task.h linux-2.6.26-rlim64/include/linux/init_task.h
--- linux-2.6.26-base/include/linux/init_task.h	2008-07-14 03:21:29.000000000 +0530
+++ linux-2.6.26-rlim64/include/linux/init_task.h	2008-07-20 21:14:24.000000000 +0530
@@ -47,6 +47,7 @@ extern struct files_struct init_files;
 	.posix_timers	 = LIST_HEAD_INIT(sig.posix_timers),		\
 	.cpu_timers	= INIT_CPU_TIMERS(sig.cpu_timers),		\
 	.rlim		= INIT_RLIMITS,					\
+        .rlim64         = INIT_RLIMITS64,                               \
 }
 
 extern struct nsproxy init_nsproxy;
diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/include/linux/resource.h linux-2.6.26-rlim64/include/linux/resource.h
--- linux-2.6.26-base/include/linux/resource.h	2008-07-14 03:21:29.000000000 +0530
+++ linux-2.6.26-rlim64/include/linux/resource.h	2008-07-20 21:15:22.000000000 +0530
@@ -45,6 +45,11 @@ struct rlimit {
 	unsigned long	rlim_max;
 };
 
+struct rlimit64 {
+        u64   rlim64_cur;
+        u64   rlim64_max;
+};
+
 #define	PRIO_MIN	(-20)
 #define	PRIO_MAX	20
 
diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/include/linux/sched.h linux-2.6.26-rlim64/include/linux/sched.h
--- linux-2.6.26-base/include/linux/sched.h	2008-07-14 03:21:29.000000000 +0530
+++ linux-2.6.26-rlim64/include/linux/sched.h	2008-07-20 21:17:05.000000000 +0530
@@ -523,6 +523,7 @@ struct signal_struct {
 	 * have no need to disable irqs.
 	 */
 	struct rlimit rlim[RLIM_NLIMITS];
+        struct rlimit64 rlim64[RLIM64_NLIMITS];
 
 	struct list_head cpu_timers[3];
 
diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/kernel/ChangeLog linux-2.6.26-rlim64/kernel/ChangeLog
--- linux-2.6.26-base/kernel/ChangeLog	1970-01-01 05:30:00.000000000 +0530
+++ linux-2.6.26-rlim64/kernel/ChangeLog	2008-07-28 16:06:15.000000000 +0530
@@ -0,0 +1,10 @@
+2008-07-28  Narendra Prasad <narendramind@gmail.com>
+    Problem Description:
+        The following issue affects the setrlimit() and getrlimit() system calls on Linux 2.6.13 (and earlier) on x86.
+        The Problem is filed at kernel.org bug 5042 (http://bugzilla.kernel.org/show_bug.cgi?id=5042)
+    Design Approach:
+        Add two system calls sys_setrlimit64()/sys_getrlimit64().
+        And a type 'struct rlimit64' to accomodate limits more <= 2^64-1
+    Implementation Details:
+        Inclutions: struct rlimit64, struct rlimit64
+        rlim64[RLIM64_NRLIMITS] to task_struct
diff -uNrp -X linux-2.6.26-base/Documentation/dontdiff linux-2.6.26-base/kernel/sys.c linux-2.6.26-rlim64/kernel/sys.c
--- linux-2.6.26-base/kernel/sys.c	2008-07-14 03:21:29.000000000 +0530
+++ linux-2.6.26-rlim64/kernel/sys.c	2008-07-28 15:41:02.000000000 +0530
@@ -1524,6 +1524,132 @@ out:
 	return 0;
 }
 
+asmlinkage long sys_getrlimit64(unsigned int resource, struct rlimit64 __user *rlim)
+{
+    struct rlimit64  value;
+
+    if (resource >= RLIM_NLIMITS)
+        return -EINVAL;
+
+    printk("\nresource = %d, RLIM64_INFINITY = %llx, RLIMIT_FSIZE = %d, RLIM64_NLIMITS = %d", 
+           resource, RLIM64_INFINITY, RLIMIT_FSIZE, RLIM64_NLIMITS);
+    if (resource == RLIMIT_FSIZE) {
+        task_lock(current->group_leader);
+        printk("\ncurrent rlim64               : max64 = %llx, cur64 = %llx", 
+               current->signal->rlim64[resource].rlim64_max, current->signal->rlim64[resource].rlim64_cur);
+        printk("\nvalue (local var, before)    : max64 = %llx, cur64 = %llx", value.rlim64_max, value.rlim64_cur);
+        value = current->signal->rlim64[resource];
+        printk("\nvalue (after assignment)     : max64 = %llx, cur64 = %llx", value.rlim64_max, value.rlim64_cur);
+        task_unlock(current->group_leader);
+        return copy_to_user(rlim, &value, sizeof(*rlim)) ? -EFAULT : 0;
+    }
+    else {
+        task_lock(current->group_leader);
+        value.rlim64_max = current->signal->rlim[resource].rlim_max;
+        value.rlim64_cur = current->signal->rlim[resource].rlim_cur;
+        task_unlock(current->group_leader);
+        if (value.rlim64_cur == RLIM_INFINITY)
+            value.rlim64_cur = RLIM64_INFINITY;
+        if (value.rlim64_max == RLIM_INFINITY)
+            value.rlim64_max = RLIM64_INFINITY;
+        /* XX: RLIM_SAVED_MAX ? RLIM_SAVED_CUR ? (See Large-File-Summit) */
+    }
+    return (copy_to_user(rlim, &value, sizeof(*rlim)) ? -EFAULT : 0);
+}
+
+asmlinkage long sys_setrlimit64(unsigned int resource, struct rlimit64 __user *rlim)
+{
+    struct rlimit64  new_rlim;
+    struct rlimit    *old_rlim, new_value;
+    unsigned long    it_prof_secs;
+    int              retval;
+
+    if (resource >= RLIM_NLIMITS)
+        return -EINVAL;
+    if(copy_from_user(&new_rlim, rlim, sizeof(*rlim)))
+        return -EFAULT;
+
+    if (resource == RLIMIT_FSIZE) {
+        struct rlimit64  *old_rlim;
+        struct rlimit    *old_value;
+
+        old_rlim = current->signal->rlim64 + resource;
+        if (((new_rlim.rlim64_cur > old_rlim->rlim64_max) ||
+             (new_rlim.rlim64_max > old_rlim->rlim64_max)) &&
+             !capable(CAP_SYS_RESOURCE))
+            return -EPERM;
+        *old_rlim = new_rlim;
+        if (new_rlim.rlim64_cur > RLIM_INFINITY)
+               new_rlim.rlim64_cur = RLIM_INFINITY;
+        if (new_rlim.rlim64_max > RLIM_INFINITY)
+               new_rlim.rlim64_max = RLIM_INFINITY;
+        task_lock(current->group_leader);
+        old_value = (current->signal->rlim + resource);
+        old_value->rlim_max = new_rlim.rlim64_max;
+        old_value->rlim_cur = new_rlim.rlim64_cur;
+        task_unlock(current->group_leader);
+        return 0;
+    }
+
+    old_rlim = current->signal->rlim + resource;
+    if (new_rlim.rlim64_cur > RLIM_INFINITY)  new_rlim.rlim64_cur = RLIM_INFINITY;
+    if (new_rlim.rlim64_max > RLIM_INFINITY)  new_rlim.rlim64_max = RLIM_INFINITY;
+    if (((new_rlim.rlim64_cur > old_rlim->rlim_max) ||
+         (new_rlim.rlim64_max > old_rlim->rlim_max)) &&
+          !capable(CAP_SYS_RESOURCE))
+        return -EPERM;
+    if (resource == RLIMIT_NOFILE) {
+        if (new_rlim.rlim64_cur > INR_OPEN || new_rlim.rlim64_max > INR_OPEN)
+            return -EPERM;
+    }
+    new_value.rlim_max = new_rlim.rlim64_max;
+    new_value.rlim_cur = new_rlim.rlim64_cur;
+    retval = security_task_setrlimit(resource, &new_value);
+    if (retval)
+        return retval;
+
+    if (resource == RLIMIT_CPU && new_value.rlim_cur == 0) {
+        /*
+         * The caller is asking for an immediate RLIMIT_CPU
+         * expiry.  But we use the zero value to mean "it was
+         * never set".  So let's cheat and make it one second
+         * instead
+         */
+        new_value.rlim_cur = 1;
+    }
+
+    task_lock(current->group_leader);
+    *old_rlim = new_value;
+    task_unlock(current->group_leader);
+
+    if (resource != RLIMIT_CPU)
+        goto out;
+
+    /*
+     * RLIMIT_CPU handling.   Note that the kernel fails to return an error
+     * code if it rejected the user's attempt to set RLIMIT_CPU.  This is a
+     * very long-standing error, and fixing it now risks breakage of
+     * applications, so we live with it
+     */
+    if (new_value.rlim_cur == RLIM_INFINITY)
+        goto out;
+
+    it_prof_secs = cputime_to_secs(current->signal->it_prof_expires);
+    if (it_prof_secs == 0 || new_value.rlim_cur <= it_prof_secs) {
+        unsigned long  rlim_cur = new_value.rlim_cur;
+        cputime_t      cputime;
+
+        cputime = secs_to_cputime(rlim_cur);
+        read_lock(&tasklist_lock);
+        spin_lock_irq(&current->sighand->siglock);
+        set_process_cpu_timer(current, CPUCLOCK_PROF, &cputime, NULL);
+        spin_unlock_irq(&current->sighand->siglock);
+        read_unlock(&tasklist_lock);
+    }
+out:
+        return 0;
+}
+
 /*
  * It would make sense to put struct rusage in the task_struct,
  * except that would make the task_struct be *really big*.  After

^ permalink raw reply	[flat|nested] 3+ messages in thread

* 64-bit rlimits
  2008-07-28 15:21 akpm@osdl.org,mtk.manpages@gmail.com Narendra Prasad Madanapalli
@ 2008-07-28 15:30 ` Alexey Dobriyan
  2008-07-28 20:25   ` Theodore Tso
  0 siblings, 1 reply; 3+ messages in thread
From: Alexey Dobriyan @ 2008-07-28 15:30 UTC (permalink / raw)
  To: Narendra Prasad Madanapalli; +Cc: linux-kernel, linux-fsdevel

Having in-kernel 32-bit AND 64-bit rlimits is plain wrong, sorry.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 64-bit rlimits
  2008-07-28 15:30 ` 64-bit rlimits Alexey Dobriyan
@ 2008-07-28 20:25   ` Theodore Tso
  0 siblings, 0 replies; 3+ messages in thread
From: Theodore Tso @ 2008-07-28 20:25 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Narendra Prasad Madanapalli, linux-kernel, linux-fsdevel

On Mon, Jul 28, 2008 at 07:30:29PM +0400, Alexey Dobriyan wrote:
> Having in-kernel 32-bit AND 64-bit rlimits is plain wrong, sorry.

For backwards compatibility reasons, if we want to support 64-bit
rlimits on an 32-bit x86 architectures, it is inevitable.  It is the
only correct way to do things.

	      	     	 		      - Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-07-28 20:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-28 15:21 akpm@osdl.org,mtk.manpages@gmail.com Narendra Prasad Madanapalli
2008-07-28 15:30 ` 64-bit rlimits Alexey Dobriyan
2008-07-28 20:25   ` Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).