* mlockall() with pid parameter
@ 2016-12-07 15:39 Federico Reghenzani
2016-12-07 16:21 ` Vlastimil Babka
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Federico Reghenzani @ 2016-12-07 15:39 UTC (permalink / raw)
To: linux-mm
[-- Attachment #1: Type: text/plain, Size: 905 bytes --]
Hello,
I'm working on Real-Time applications in Linux. `mlockall()` is a typical
syscall used in RT processes in order to avoid page faults. However, the
use of this syscall is strongly limited by ulimits, so basically all RT
processes that want to call `mlockall()` have to be executed with root
privileges.
What I would like to have is a syscall that accept a "pid", so a process
spawned by root would be able to enforce the memory locking to other
non-root processes. The prototypes would be:
int mlockall(int flags, pid_t pid);
int munlockall(pid_t pid);
I checked the source code and it seems to me quite easy to add this syscall
variant.
I'm writing here to have a feedback before starting to edit the code. Do
you think that this is a good approach?
Thank you,
Federico
--
*Federico Reghenzani*
PhD Candidate
Politecnico di Milano
Dipartimento di Elettronica, Informazione e Bioingegneria
[-- Attachment #2: Type: text/html, Size: 1337 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter
2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani
@ 2016-12-07 16:21 ` Vlastimil Babka
2016-12-07 16:33 ` Federico Reghenzani
2016-12-07 17:40 ` Dave Hansen
2016-12-07 18:42 ` Kirill A. Shutemov
2 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2016-12-07 16:21 UTC (permalink / raw)
To: Federico Reghenzani, linux-mm
On 12/07/2016 04:39 PM, Federico Reghenzani wrote:
> Hello,
>
> I'm working on Real-Time applications in Linux. `mlockall()` is a
> typical syscall used in RT processes in order to avoid page faults.
> However, the use of this syscall is strongly limited by ulimits, so
> basically all RT processes that want to call `mlockall()` have to be
> executed with root privileges.
Is it not possible to change the ulimits with e.g. prlimit?
> What I would like to have is a syscall that accept a "pid", so a process
> spawned by root would be able to enforce the memory locking to other
> non-root processes. The prototypes would be:
>
> int mlockall(int flags, pid_t pid);
> int munlockall(pid_t pid);
>
> I checked the source code and it seems to me quite easy to add this
> syscall variant.
>
> I'm writing here to have a feedback before starting to edit the code. Do
> you think that this is a good approach?
>
>
> Thank you,
> Federico
>
> --
> *Federico Reghenzani*
> PhD Candidate
> Politecnico di Milano
> Dipartimento di Elettronica, Informazione e Bioingegneria
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter
2016-12-07 16:21 ` Vlastimil Babka
@ 2016-12-07 16:33 ` Federico Reghenzani
2016-12-07 20:01 ` Vlastimil Babka
0 siblings, 1 reply; 8+ messages in thread
From: Federico Reghenzani @ 2016-12-07 16:33 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Federico Reghenzani, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1752 bytes --]
2016-12-07 17:21 GMT+01:00 Vlastimil Babka <vbabka@suse.cz>:
> On 12/07/2016 04:39 PM, Federico Reghenzani wrote:
> > Hello,
> >
> > I'm working on Real-Time applications in Linux. `mlockall()` is a
> > typical syscall used in RT processes in order to avoid page faults.
> > However, the use of this syscall is strongly limited by ulimits, so
> > basically all RT processes that want to call `mlockall()` have to be
> > executed with root privileges.
>
> Is it not possible to change the ulimits with e.g. prlimit?
>
>
Yes, but it requires a synchronization between non-root process and root
process.
Because the root process has to change the limits before the non-root
process executes the mlockall().
Just to provide an example, another syscall used in RT tasks is the
sched_setscheduler() that also suffers
the limitation of ulimits, but it accepts the pid so the scheduling policy
can be enforced by a root process to
any other process.
> > What I would like to have is a syscall that accept a "pid", so a process
> > spawned by root would be able to enforce the memory locking to other
> > non-root processes. The prototypes would be:
> >
> > int mlockall(int flags, pid_t pid);
> > int munlockall(pid_t pid);
> >
> > I checked the source code and it seems to me quite easy to add this
> > syscall variant.
> >
> > I'm writing here to have a feedback before starting to edit the code. Do
> > you think that this is a good approach?
> >
> >
> > Thank you,
> > Federico
> >
> > --
> > *Federico Reghenzani*
> > PhD Candidate
> > Politecnico di Milano
> > Dipartimento di Elettronica, Informazione e Bioingegneria
> >
>
>
--
*Federico Reghenzani*
PhD Candidate
Politecnico di Milano
Dipartimento di Elettronica, Informazione e Bioingegneria
[-- Attachment #2: Type: text/html, Size: 2863 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter
2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani
2016-12-07 16:21 ` Vlastimil Babka
@ 2016-12-07 17:40 ` Dave Hansen
2016-12-07 18:42 ` Kirill A. Shutemov
2 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2016-12-07 17:40 UTC (permalink / raw)
To: Federico Reghenzani, linux-mm, Vlastimil Babka
On 12/07/2016 07:39 AM, Federico Reghenzani wrote:
> What I would like to have is a syscall that accept a "pid", so a process
> spawned by root would be able to enforce the memory locking to other
> non-root processes. The prototypes would be:
>
> int mlockall(int flags, pid_t pid);
> int munlockall(pid_t pid);
The prototypes don't really tell enough of the story to give you good
feedback. For instance, whose rlimit do these count against? Are all
the MCL_CURRENT/FUTURE/FAULT flags supported?
I think you need to start implementing something to actually see how
ugly this gets in practice.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter
2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani
2016-12-07 16:21 ` Vlastimil Babka
2016-12-07 17:40 ` Dave Hansen
@ 2016-12-07 18:42 ` Kirill A. Shutemov
2 siblings, 0 replies; 8+ messages in thread
From: Kirill A. Shutemov @ 2016-12-07 18:42 UTC (permalink / raw)
To: Federico Reghenzani; +Cc: linux-mm
On Wed, Dec 07, 2016 at 04:39:13PM +0100, Federico Reghenzani wrote:
> Hello,
>
> I'm working on Real-Time applications in Linux. `mlockall()` is a typical
> syscall used in RT processes in order to avoid page faults. However, the
> use of this syscall is strongly limited by ulimits, so basically all RT
> processes that want to call `mlockall()` have to be executed with root
> privileges.
For raising rlimits, you don't really need full root, only
CAP_SYS_RESOURCES (I'm not sure if it's any safer than full root in
practice).
It gives one other possible approach: set the capability for the binary.
Real-time proceses is already somewhat priviledged, right?
I mean CAP_SYS_NICE.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter
2016-12-07 16:33 ` Federico Reghenzani
@ 2016-12-07 20:01 ` Vlastimil Babka
2016-12-08 12:58 ` Federico Reghenzani
0 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2016-12-07 20:01 UTC (permalink / raw)
To: Federico Reghenzani; +Cc: linux-mm
On 12/07/2016 05:33 PM, Federico Reghenzani wrote:
>
>
> 2016-12-07 17:21 GMT+01:00 Vlastimil Babka <vbabka@suse.cz
> <mailto:vbabka@suse.cz>>:
>
> On 12/07/2016 04:39 PM, Federico Reghenzani wrote:
> > Hello,
> >
> > I'm working on Real-Time applications in Linux. `mlockall()` is a
> > typical syscall used in RT processes in order to avoid page faults.
> > However, the use of this syscall is strongly limited by ulimits, so
> > basically all RT processes that want to call `mlockall()` have to be
> > executed with root privileges.
>
> Is it not possible to change the ulimits with e.g. prlimit?
>
>
> Yes, but it requires a synchronization between non-root process and root
> process.
> Because the root process has to change the limits before the non-root
> process executes the mlockall().
Would it work if you did that between fork() and exec()? If you can
spawn them like this, that is.
> Just to provide an example, another syscall used in RT tasks is the
> sched_setscheduler() that also suffers
> the limitation of ulimits, but it accepts the pid so the scheduling
> policy can be enforced by a root process to
> any other process.
>
>
>
> > What I would like to have is a syscall that accept a "pid", so a process
> > spawned by root would be able to enforce the memory locking to other
> > non-root processes. The prototypes would be:
> >
> > int mlockall(int flags, pid_t pid);
> > int munlockall(pid_t pid);
> >
> > I checked the source code and it seems to me quite easy to add this
> > syscall variant.
> >
> > I'm writing here to have a feedback before starting to edit the code. Do
> > you think that this is a good approach?
> >
> >
> > Thank you,
> > Federico
> >
> > --
> > *Federico Reghenzani*
> > PhD Candidate
> > Politecnico di Milano
> > Dipartimento di Elettronica, Informazione e Bioingegneria
> >
>
>
>
>
> --
> *Federico Reghenzani*
> PhD Candidate
> Politecnico di Milano
> Dipartimento di Elettronica, Informazione e Bioingegneria
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter
2016-12-07 20:01 ` Vlastimil Babka
@ 2016-12-08 12:58 ` Federico Reghenzani
2016-12-09 15:36 ` Federico Reghenzani
0 siblings, 1 reply; 8+ messages in thread
From: Federico Reghenzani @ 2016-12-08 12:58 UTC (permalink / raw)
To: Vlastimil Babka, kirill; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 4567 bytes --]
Ok, these solutions are feasible but not very comfortable.
I'll explain better what I'm going to do. I'm a developer of Barbeque Open
<http://bosp.dei.polimi.it/>
Source Project <http://bosp.dei.polimi.it/> that is run-time resource
manager. It is basically composed of a
daemon (barbeque) and a library (rtlib) linked with user applications. A
user
starts a process linked with rtlib that exchanges some information with
Barbeque
(e.g. it requests a performance goal). Barbeque is in charge of the
assignment
of resources trying to maintain the performance goals of all applications
and predefined system requirements (e.g. temperatures and power
consumption).
When processes start, Barbeque tunes several parameters at run-time: create
and
set CGroups, select cpu governors and frequency, etc. In the case of a
real-time
process it decides the scheduling policy, the scheduling parameters, etc.
Barbeque runs with root privileges, thus it has the CAP_SYS_NICE capability
to enforce a RT scheduling policy on applications.
The idea is to give to Barbeque the possibility to dinamically select if
enforcing mlockall() or not for RT tasks, according to the available memory
resources. I can do this using a sort of synchronization mechanism: Barbeque
sets limits of the process and signal the rtlib to execute the mlockall()
or the
munlockall(), but I think it would be better to have a syscall that
Barbeque can
call directly without interfering with process execution.
Yesterday I rapidly read the code of mlockall() and relative functions and I
think that in order to add a pid parameter is maybe sufficient to convert
the
pid into a task struct and replace `current` with it. Probably, it will not
be so easy. Tomorrow I'm going to read the code more in details and check if
the implementation is actually easy and does not involve too much
refactoring in
the present code.
Thank you,
Federico
2016-12-07 21:01 GMT+01:00 Vlastimil Babka <vbabka@suse.cz>:
> On 12/07/2016 05:33 PM, Federico Reghenzani wrote:
> >
> >
> > 2016-12-07 17:21 GMT+01:00 Vlastimil Babka <vbabka@suse.cz
> > <mailto:vbabka@suse.cz>>:
> >
> > On 12/07/2016 04:39 PM, Federico Reghenzani wrote:
> > > Hello,
> > >
> > > I'm working on Real-Time applications in Linux. `mlockall()` is a
> > > typical syscall used in RT processes in order to avoid page faults.
> > > However, the use of this syscall is strongly limited by ulimits, so
> > > basically all RT processes that want to call `mlockall()` have to
> be
> > > executed with root privileges.
> >
> > Is it not possible to change the ulimits with e.g. prlimit?
> >
> >
> > Yes, but it requires a synchronization between non-root process and root
> > process.
> > Because the root process has to change the limits before the non-root
> > process executes the mlockall().
>
> Would it work if you did that between fork() and exec()? If you can
> spawn them like this, that is.
>
> > Just to provide an example, another syscall used in RT tasks is the
> > sched_setscheduler() that also suffers
> > the limitation of ulimits, but it accepts the pid so the scheduling
> > policy can be enforced by a root process to
> > any other process.
> >
> >
> >
> > > What I would like to have is a syscall that accept a "pid", so a
> process
> > > spawned by root would be able to enforce the memory locking to
> other
> > > non-root processes. The prototypes would be:
> > >
> > > int mlockall(int flags, pid_t pid);
> > > int munlockall(pid_t pid);
> > >
> > > I checked the source code and it seems to me quite easy to add this
> > > syscall variant.
> > >
> > > I'm writing here to have a feedback before starting to edit the
> code. Do
> > > you think that this is a good approach?
> > >
> > >
> > > Thank you,
> > > Federico
> > >
> > > --
> > > *Federico Reghenzani*
> > > PhD Candidate
> > > Politecnico di Milano
> > > Dipartimento di Elettronica, Informazione e Bioingegneria
> > >
> >
> >
> >
> >
> > --
> > *Federico Reghenzani*
> > PhD Candidate
> > Politecnico di Milano
> > Dipartimento di Elettronica, Informazione e Bioingegneria
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
*Federico Reghenzani*
PhD Candidate
Politecnico di Milano
Dipartimento di Elettronica, Informazione e Bioingegneria
[-- Attachment #2: Type: text/html, Size: 6553 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlockall() with pid parameter
2016-12-08 12:58 ` Federico Reghenzani
@ 2016-12-09 15:36 ` Federico Reghenzani
0 siblings, 0 replies; 8+ messages in thread
From: Federico Reghenzani @ 2016-12-09 15:36 UTC (permalink / raw)
To: linux-mm
[-- Attachment #1.1: Type: text/plain, Size: 481 bytes --]
I attached a patch proposal, it adds mlockall_pid() and munlockall_pid()
syscalls (I've included only the mm/mlock.c file).
I generalized the present code to work on a pointer `p` that in case of
mlockall() and munlockall() corresponds to `current`.
Instead, with mlockall_pid() and munlockall_pid(), after permission checks,
it gets the task_struct from find_task_by_vpid.
I tested the syscalls and they seem ok, but I'm not sure how to test them
thoroughly.
Cheers,
Federico
[-- Attachment #1.2: Type: text/html, Size: 777 bytes --]
[-- Attachment #2: mlock.patch --]
[-- Type: text/x-patch, Size: 4233 bytes --]
diff --git a/mm/mlock.c b/mm/mlock.c
index cdbed8a..f1c4bdc 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -752,17 +752,17 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
* is called once including the MCL_FUTURE flag and then a second time without
* it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags.
*/
-static int apply_mlockall_flags(int flags)
+static int apply_mlockall_flags(struct task_struct *p, int flags)
{
struct vm_area_struct * vma, * prev = NULL;
vm_flags_t to_add = 0;
- current->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
+ p->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
if (flags & MCL_FUTURE) {
- current->mm->def_flags |= VM_LOCKED;
+ p->mm->def_flags |= VM_LOCKED;
if (flags & MCL_ONFAULT)
- current->mm->def_flags |= VM_LOCKONFAULT;
+ p->mm->def_flags |= VM_LOCKONFAULT;
if (!(flags & MCL_CURRENT))
goto out;
@@ -774,7 +774,7 @@ static int apply_mlockall_flags(int flags)
to_add |= VM_LOCKONFAULT;
}
- for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
+ for (vma = p->mm->mmap; vma ; vma = prev->vm_next) {
vm_flags_t newflags;
newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
@@ -788,7 +788,7 @@ static int apply_mlockall_flags(int flags)
return 0;
}
-SYSCALL_DEFINE1(mlockall, int, flags)
+static int _mlockall(struct task_struct *p, int flags)
{
unsigned long lock_limit;
int ret;
@@ -805,31 +805,132 @@ SYSCALL_DEFINE1(mlockall, int, flags)
lock_limit = rlimit(RLIMIT_MEMLOCK);
lock_limit >>= PAGE_SHIFT;
- if (down_write_killable(¤t->mm->mmap_sem))
+ if (down_write_killable(&p->mm->mmap_sem))
return -EINTR;
ret = -ENOMEM;
- if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
+ if (!(flags & MCL_CURRENT) || (p->mm->total_vm <= lock_limit) ||
capable(CAP_IPC_LOCK))
- ret = apply_mlockall_flags(flags);
- up_write(¤t->mm->mmap_sem);
+ ret = apply_mlockall_flags(p, flags);
+ up_write(&p->mm->mmap_sem);
if (!ret && (flags & MCL_CURRENT))
mm_populate(0, TASK_SIZE);
return ret;
}
-SYSCALL_DEFINE0(munlockall)
+static int _munlockall(struct task_struct *p)
{
int ret;
- if (down_write_killable(¤t->mm->mmap_sem))
+ if (down_write_killable(&p->mm->mmap_sem))
return -EINTR;
- ret = apply_mlockall_flags(0);
- up_write(¤t->mm->mmap_sem);
+ ret = apply_mlockall_flags(p, 0);
+ up_write(&p->mm->mmap_sem);
+
+ return ret;
+}
+
+static bool check_same_owner(struct task_struct *p)
+{
+ const struct cred *cred = current_cred(), *pcred;
+ bool match;
+
+ rcu_read_lock();
+ pcred = __task_cred(p);
+ match = (uid_eq(cred->euid, pcred->euid) ||
+ uid_eq(cred->euid, pcred->uid));
+ rcu_read_unlock();
+ return match;
+}
+
+/*
+ * Check the permission to exec the mlockall_pid and munlockall_pid and write
+ * the struct corresponding to the pid provided.
+ */
+static int check_and_get_process(pid_t pid, struct task_struct **p)
+{
+ *p = NULL;
+
+ if (pid < 0)
+ return -EINVAL;
+
+ if (pid == 0) {
+ *p = current;
+ return 0;
+ }
+
+ rcu_read_lock();
+ *p = find_task_by_vpid(pid);
+
+ if (*p == NULL) {
+ rcu_read_unlock();
+ return -ESRCH;
+ }
+
+ if ((*p)->flags & PF_KTHREAD) {
+ rcu_read_unlock();
+ return -EINVAL;
+ }
+
+ /* Prevent p going away */
+ get_task_struct(*p);
+ rcu_read_unlock();
+
+ if (!check_same_owner(*p) && !capable(CAP_IPC_LOCK)) {
+ put_task_struct(*p);
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+SYSCALL_DEFINE1(mlockall, int, flags)
+{
+ return _mlockall(current, flags);
+}
+
+SYSCALL_DEFINE0(munlockall)
+{
+ return _munlockall(current);
+}
+
+SYSCALL_DEFINE2(mlockall_pid, pid_t, pid, int, flags)
+{
+ int ret;
+ struct task_struct *p;
+
+ ret = check_and_get_process(pid, &p);
+
+ if (ret)
+ return ret;
+
+ ret = _mlockall(p, flags);
+
+ if (p != current)
+ put_task_struct(p);
+
return ret;
}
+SYSCALL_DEFINE1(munlockall_pid, pid_t, pid)
+{
+ int ret;
+ struct task_struct *p;
+
+ ret = check_and_get_process(pid, &p);
+ if (ret)
+ return ret;
+
+ ret = _munlockall(p);
+
+ if (p != current)
+ put_task_struct(p);
+
+ return ret;
+}
+
+
/*
* Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB
* shm segments) get accounted against the user_struct instead.
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-12-09 15:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-07 15:39 mlockall() with pid parameter Federico Reghenzani
2016-12-07 16:21 ` Vlastimil Babka
2016-12-07 16:33 ` Federico Reghenzani
2016-12-07 20:01 ` Vlastimil Babka
2016-12-08 12:58 ` Federico Reghenzani
2016-12-09 15:36 ` Federico Reghenzani
2016-12-07 17:40 ` Dave Hansen
2016-12-07 18:42 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).