* mlockall() with pid parameter @ 2016-12-07 15:39 Federico Reghenzani 2016-12-07 16:21 ` Vlastimil Babka ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Federico Reghenzani @ 2016-12-07 15:39 UTC (permalink / raw) To: linux-mm [-- Attachment #1: Type: text/plain, Size: 905 bytes --] Hello, I'm working on Real-Time applications in Linux. `mlockall()` is a typical syscall used in RT processes in order to avoid page faults. However, the use of this syscall is strongly limited by ulimits, so basically all RT processes that want to call `mlockall()` have to be executed with root privileges. What I would like to have is a syscall that accept a "pid", so a process spawned by root would be able to enforce the memory locking to other non-root processes. The prototypes would be: int mlockall(int flags, pid_t pid); int munlockall(pid_t pid); I checked the source code and it seems to me quite easy to add this syscall variant. I'm writing here to have a feedback before starting to edit the code. Do you think that this is a good approach? Thank you, Federico -- *Federico Reghenzani* PhD Candidate Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria [-- Attachment #2: Type: text/html, Size: 1337 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter 2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani @ 2016-12-07 16:21 ` Vlastimil Babka 2016-12-07 16:33 ` Federico Reghenzani 2016-12-07 17:40 ` Dave Hansen 2016-12-07 18:42 ` Kirill A. Shutemov 2 siblings, 1 reply; 8+ messages in thread From: Vlastimil Babka @ 2016-12-07 16:21 UTC (permalink / raw) To: Federico Reghenzani, linux-mm On 12/07/2016 04:39 PM, Federico Reghenzani wrote: > Hello, > > I'm working on Real-Time applications in Linux. `mlockall()` is a > typical syscall used in RT processes in order to avoid page faults. > However, the use of this syscall is strongly limited by ulimits, so > basically all RT processes that want to call `mlockall()` have to be > executed with root privileges. Is it not possible to change the ulimits with e.g. prlimit? > What I would like to have is a syscall that accept a "pid", so a process > spawned by root would be able to enforce the memory locking to other > non-root processes. The prototypes would be: > > int mlockall(int flags, pid_t pid); > int munlockall(pid_t pid); > > I checked the source code and it seems to me quite easy to add this > syscall variant. > > I'm writing here to have a feedback before starting to edit the code. Do > you think that this is a good approach? > > > Thank you, > Federico > > -- > *Federico Reghenzani* > PhD Candidate > Politecnico di Milano > Dipartimento di Elettronica, Informazione e Bioingegneria > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter 2016-12-07 16:21 ` Vlastimil Babka @ 2016-12-07 16:33 ` Federico Reghenzani 2016-12-07 20:01 ` Vlastimil Babka 0 siblings, 1 reply; 8+ messages in thread From: Federico Reghenzani @ 2016-12-07 16:33 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Federico Reghenzani, linux-mm [-- Attachment #1: Type: text/plain, Size: 1752 bytes --] 2016-12-07 17:21 GMT+01:00 Vlastimil Babka <vbabka@suse.cz>: > On 12/07/2016 04:39 PM, Federico Reghenzani wrote: > > Hello, > > > > I'm working on Real-Time applications in Linux. `mlockall()` is a > > typical syscall used in RT processes in order to avoid page faults. > > However, the use of this syscall is strongly limited by ulimits, so > > basically all RT processes that want to call `mlockall()` have to be > > executed with root privileges. > > Is it not possible to change the ulimits with e.g. prlimit? > > Yes, but it requires a synchronization between non-root process and root process. Because the root process has to change the limits before the non-root process executes the mlockall(). Just to provide an example, another syscall used in RT tasks is the sched_setscheduler() that also suffers the limitation of ulimits, but it accepts the pid so the scheduling policy can be enforced by a root process to any other process. > > What I would like to have is a syscall that accept a "pid", so a process > > spawned by root would be able to enforce the memory locking to other > > non-root processes. The prototypes would be: > > > > int mlockall(int flags, pid_t pid); > > int munlockall(pid_t pid); > > > > I checked the source code and it seems to me quite easy to add this > > syscall variant. > > > > I'm writing here to have a feedback before starting to edit the code. Do > > you think that this is a good approach? > > > > > > Thank you, > > Federico > > > > -- > > *Federico Reghenzani* > > PhD Candidate > > Politecnico di Milano > > Dipartimento di Elettronica, Informazione e Bioingegneria > > > > -- *Federico Reghenzani* PhD Candidate Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria [-- Attachment #2: Type: text/html, Size: 2863 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter 2016-12-07 16:33 ` Federico Reghenzani @ 2016-12-07 20:01 ` Vlastimil Babka 2016-12-08 12:58 ` Federico Reghenzani 0 siblings, 1 reply; 8+ messages in thread From: Vlastimil Babka @ 2016-12-07 20:01 UTC (permalink / raw) To: Federico Reghenzani; +Cc: linux-mm On 12/07/2016 05:33 PM, Federico Reghenzani wrote: > > > 2016-12-07 17:21 GMT+01:00 Vlastimil Babka <vbabka@suse.cz > <mailto:vbabka@suse.cz>>: > > On 12/07/2016 04:39 PM, Federico Reghenzani wrote: > > Hello, > > > > I'm working on Real-Time applications in Linux. `mlockall()` is a > > typical syscall used in RT processes in order to avoid page faults. > > However, the use of this syscall is strongly limited by ulimits, so > > basically all RT processes that want to call `mlockall()` have to be > > executed with root privileges. > > Is it not possible to change the ulimits with e.g. prlimit? > > > Yes, but it requires a synchronization between non-root process and root > process. > Because the root process has to change the limits before the non-root > process executes the mlockall(). Would it work if you did that between fork() and exec()? If you can spawn them like this, that is. > Just to provide an example, another syscall used in RT tasks is the > sched_setscheduler() that also suffers > the limitation of ulimits, but it accepts the pid so the scheduling > policy can be enforced by a root process to > any other process. > > > > > What I would like to have is a syscall that accept a "pid", so a process > > spawned by root would be able to enforce the memory locking to other > > non-root processes. The prototypes would be: > > > > int mlockall(int flags, pid_t pid); > > int munlockall(pid_t pid); > > > > I checked the source code and it seems to me quite easy to add this > > syscall variant. > > > > I'm writing here to have a feedback before starting to edit the code. Do > > you think that this is a good approach? > > > > > > Thank you, > > Federico > > > > -- > > *Federico Reghenzani* > > PhD Candidate > > Politecnico di Milano > > Dipartimento di Elettronica, Informazione e Bioingegneria > > > > > > > -- > *Federico Reghenzani* > PhD Candidate > Politecnico di Milano > Dipartimento di Elettronica, Informazione e Bioingegneria > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter 2016-12-07 20:01 ` Vlastimil Babka @ 2016-12-08 12:58 ` Federico Reghenzani 2016-12-09 15:36 ` Federico Reghenzani 0 siblings, 1 reply; 8+ messages in thread From: Federico Reghenzani @ 2016-12-08 12:58 UTC (permalink / raw) To: Vlastimil Babka, kirill; +Cc: linux-mm [-- Attachment #1: Type: text/plain, Size: 4567 bytes --] Ok, these solutions are feasible but not very comfortable. I'll explain better what I'm going to do. I'm a developer of Barbeque Open <http://bosp.dei.polimi.it/> Source Project <http://bosp.dei.polimi.it/> that is run-time resource manager. It is basically composed of a daemon (barbeque) and a library (rtlib) linked with user applications. A user starts a process linked with rtlib that exchanges some information with Barbeque (e.g. it requests a performance goal). Barbeque is in charge of the assignment of resources trying to maintain the performance goals of all applications and predefined system requirements (e.g. temperatures and power consumption). When processes start, Barbeque tunes several parameters at run-time: create and set CGroups, select cpu governors and frequency, etc. In the case of a real-time process it decides the scheduling policy, the scheduling parameters, etc. Barbeque runs with root privileges, thus it has the CAP_SYS_NICE capability to enforce a RT scheduling policy on applications. The idea is to give to Barbeque the possibility to dinamically select if enforcing mlockall() or not for RT tasks, according to the available memory resources. I can do this using a sort of synchronization mechanism: Barbeque sets limits of the process and signal the rtlib to execute the mlockall() or the munlockall(), but I think it would be better to have a syscall that Barbeque can call directly without interfering with process execution. Yesterday I rapidly read the code of mlockall() and relative functions and I think that in order to add a pid parameter is maybe sufficient to convert the pid into a task struct and replace `current` with it. Probably, it will not be so easy. Tomorrow I'm going to read the code more in details and check if the implementation is actually easy and does not involve too much refactoring in the present code. Thank you, Federico 2016-12-07 21:01 GMT+01:00 Vlastimil Babka <vbabka@suse.cz>: > On 12/07/2016 05:33 PM, Federico Reghenzani wrote: > > > > > > 2016-12-07 17:21 GMT+01:00 Vlastimil Babka <vbabka@suse.cz > > <mailto:vbabka@suse.cz>>: > > > > On 12/07/2016 04:39 PM, Federico Reghenzani wrote: > > > Hello, > > > > > > I'm working on Real-Time applications in Linux. `mlockall()` is a > > > typical syscall used in RT processes in order to avoid page faults. > > > However, the use of this syscall is strongly limited by ulimits, so > > > basically all RT processes that want to call `mlockall()` have to > be > > > executed with root privileges. > > > > Is it not possible to change the ulimits with e.g. prlimit? > > > > > > Yes, but it requires a synchronization between non-root process and root > > process. > > Because the root process has to change the limits before the non-root > > process executes the mlockall(). > > Would it work if you did that between fork() and exec()? If you can > spawn them like this, that is. > > > Just to provide an example, another syscall used in RT tasks is the > > sched_setscheduler() that also suffers > > the limitation of ulimits, but it accepts the pid so the scheduling > > policy can be enforced by a root process to > > any other process. > > > > > > > > > What I would like to have is a syscall that accept a "pid", so a > process > > > spawned by root would be able to enforce the memory locking to > other > > > non-root processes. The prototypes would be: > > > > > > int mlockall(int flags, pid_t pid); > > > int munlockall(pid_t pid); > > > > > > I checked the source code and it seems to me quite easy to add this > > > syscall variant. > > > > > > I'm writing here to have a feedback before starting to edit the > code. Do > > > you think that this is a good approach? > > > > > > > > > Thank you, > > > Federico > > > > > > -- > > > *Federico Reghenzani* > > > PhD Candidate > > > Politecnico di Milano > > > Dipartimento di Elettronica, Informazione e Bioingegneria > > > > > > > > > > > > > -- > > *Federico Reghenzani* > > PhD Candidate > > Politecnico di Milano > > Dipartimento di Elettronica, Informazione e Bioingegneria > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- *Federico Reghenzani* PhD Candidate Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria [-- Attachment #2: Type: text/html, Size: 6553 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter 2016-12-08 12:58 ` Federico Reghenzani @ 2016-12-09 15:36 ` Federico Reghenzani 0 siblings, 0 replies; 8+ messages in thread From: Federico Reghenzani @ 2016-12-09 15:36 UTC (permalink / raw) To: linux-mm [-- Attachment #1.1: Type: text/plain, Size: 481 bytes --] I attached a patch proposal, it adds mlockall_pid() and munlockall_pid() syscalls (I've included only the mm/mlock.c file). I generalized the present code to work on a pointer `p` that in case of mlockall() and munlockall() corresponds to `current`. Instead, with mlockall_pid() and munlockall_pid(), after permission checks, it gets the task_struct from find_task_by_vpid. I tested the syscalls and they seem ok, but I'm not sure how to test them thoroughly. Cheers, Federico [-- Attachment #1.2: Type: text/html, Size: 777 bytes --] [-- Attachment #2: mlock.patch --] [-- Type: text/x-patch, Size: 4233 bytes --] diff --git a/mm/mlock.c b/mm/mlock.c index cdbed8a..f1c4bdc 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -752,17 +752,17 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) * is called once including the MCL_FUTURE flag and then a second time without * it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags. */ -static int apply_mlockall_flags(int flags) +static int apply_mlockall_flags(struct task_struct *p, int flags) { struct vm_area_struct * vma, * prev = NULL; vm_flags_t to_add = 0; - current->mm->def_flags &= VM_LOCKED_CLEAR_MASK; + p->mm->def_flags &= VM_LOCKED_CLEAR_MASK; if (flags & MCL_FUTURE) { - current->mm->def_flags |= VM_LOCKED; + p->mm->def_flags |= VM_LOCKED; if (flags & MCL_ONFAULT) - current->mm->def_flags |= VM_LOCKONFAULT; + p->mm->def_flags |= VM_LOCKONFAULT; if (!(flags & MCL_CURRENT)) goto out; @@ -774,7 +774,7 @@ static int apply_mlockall_flags(int flags) to_add |= VM_LOCKONFAULT; } - for (vma = current->mm->mmap; vma ; vma = prev->vm_next) { + for (vma = p->mm->mmap; vma ; vma = prev->vm_next) { vm_flags_t newflags; newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK; @@ -788,7 +788,7 @@ static int apply_mlockall_flags(int flags) return 0; } -SYSCALL_DEFINE1(mlockall, int, flags) +static int _mlockall(struct task_struct *p, int flags) { unsigned long lock_limit; int ret; @@ -805,31 +805,132 @@ SYSCALL_DEFINE1(mlockall, int, flags) lock_limit = rlimit(RLIMIT_MEMLOCK); lock_limit >>= PAGE_SHIFT; - if (down_write_killable(¤t->mm->mmap_sem)) + if (down_write_killable(&p->mm->mmap_sem)) return -EINTR; ret = -ENOMEM; - if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) || + if (!(flags & MCL_CURRENT) || (p->mm->total_vm <= lock_limit) || capable(CAP_IPC_LOCK)) - ret = apply_mlockall_flags(flags); - up_write(¤t->mm->mmap_sem); + ret = apply_mlockall_flags(p, flags); + up_write(&p->mm->mmap_sem); if (!ret && (flags & MCL_CURRENT)) mm_populate(0, TASK_SIZE); return ret; } -SYSCALL_DEFINE0(munlockall) +static int _munlockall(struct task_struct *p) { int ret; - if (down_write_killable(¤t->mm->mmap_sem)) + if (down_write_killable(&p->mm->mmap_sem)) return -EINTR; - ret = apply_mlockall_flags(0); - up_write(¤t->mm->mmap_sem); + ret = apply_mlockall_flags(p, 0); + up_write(&p->mm->mmap_sem); + + return ret; +} + +static bool check_same_owner(struct task_struct *p) +{ + const struct cred *cred = current_cred(), *pcred; + bool match; + + rcu_read_lock(); + pcred = __task_cred(p); + match = (uid_eq(cred->euid, pcred->euid) || + uid_eq(cred->euid, pcred->uid)); + rcu_read_unlock(); + return match; +} + +/* + * Check the permission to exec the mlockall_pid and munlockall_pid and write + * the struct corresponding to the pid provided. + */ +static int check_and_get_process(pid_t pid, struct task_struct **p) +{ + *p = NULL; + + if (pid < 0) + return -EINVAL; + + if (pid == 0) { + *p = current; + return 0; + } + + rcu_read_lock(); + *p = find_task_by_vpid(pid); + + if (*p == NULL) { + rcu_read_unlock(); + return -ESRCH; + } + + if ((*p)->flags & PF_KTHREAD) { + rcu_read_unlock(); + return -EINVAL; + } + + /* Prevent p going away */ + get_task_struct(*p); + rcu_read_unlock(); + + if (!check_same_owner(*p) && !capable(CAP_IPC_LOCK)) { + put_task_struct(*p); + return -EPERM; + } + + return 0; +} + +SYSCALL_DEFINE1(mlockall, int, flags) +{ + return _mlockall(current, flags); +} + +SYSCALL_DEFINE0(munlockall) +{ + return _munlockall(current); +} + +SYSCALL_DEFINE2(mlockall_pid, pid_t, pid, int, flags) +{ + int ret; + struct task_struct *p; + + ret = check_and_get_process(pid, &p); + + if (ret) + return ret; + + ret = _mlockall(p, flags); + + if (p != current) + put_task_struct(p); + return ret; } +SYSCALL_DEFINE1(munlockall_pid, pid_t, pid) +{ + int ret; + struct task_struct *p; + + ret = check_and_get_process(pid, &p); + if (ret) + return ret; + + ret = _munlockall(p); + + if (p != current) + put_task_struct(p); + + return ret; +} + + /* * Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB * shm segments) get accounted against the user_struct instead. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter 2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani 2016-12-07 16:21 ` Vlastimil Babka @ 2016-12-07 17:40 ` Dave Hansen 2016-12-07 18:42 ` Kirill A. Shutemov 2 siblings, 0 replies; 8+ messages in thread From: Dave Hansen @ 2016-12-07 17:40 UTC (permalink / raw) To: Federico Reghenzani, linux-mm, Vlastimil Babka On 12/07/2016 07:39 AM, Federico Reghenzani wrote: > What I would like to have is a syscall that accept a "pid", so a process > spawned by root would be able to enforce the memory locking to other > non-root processes. The prototypes would be: > > int mlockall(int flags, pid_t pid); > int munlockall(pid_t pid); The prototypes don't really tell enough of the story to give you good feedback. For instance, whose rlimit do these count against? Are all the MCL_CURRENT/FUTURE/FAULT flags supported? I think you need to start implementing something to actually see how ugly this gets in practice. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter 2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani 2016-12-07 16:21 ` Vlastimil Babka 2016-12-07 17:40 ` Dave Hansen @ 2016-12-07 18:42 ` Kirill A. Shutemov 2 siblings, 0 replies; 8+ messages in thread From: Kirill A. Shutemov @ 2016-12-07 18:42 UTC (permalink / raw) To: Federico Reghenzani; +Cc: linux-mm On Wed, Dec 07, 2016 at 04:39:13PM +0100, Federico Reghenzani wrote: > Hello, > > I'm working on Real-Time applications in Linux. `mlockall()` is a typical > syscall used in RT processes in order to avoid page faults. However, the > use of this syscall is strongly limited by ulimits, so basically all RT > processes that want to call `mlockall()` have to be executed with root > privileges. For raising rlimits, you don't really need full root, only CAP_SYS_RESOURCES (I'm not sure if it's any safer than full root in practice). It gives one other possible approach: set the capability for the binary. Real-time proceses is already somewhat priviledged, right? I mean CAP_SYS_NICE. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-12-09 15:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani 2016-12-07 16:21 ` Vlastimil Babka 2016-12-07 16:33 ` Federico Reghenzani 2016-12-07 20:01 ` Vlastimil Babka 2016-12-08 12:58 ` Federico Reghenzani 2016-12-09 15:36 ` Federico Reghenzani 2016-12-07 17:40 ` Dave Hansen 2016-12-07 18:42 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).