From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Darren Hart <darren@dvhart.com>,
Steven Rostedt <rostedt@goodmis.org>,
fredrik.markstrom@windriver.com,
Davidlohr Bueso <dave@stgolabs.net>
Subject: [PATCH v2] ipc/mqueue: remove STATE_PENDING
Date: Fri, 10 Apr 2015 16:37:26 +0200 [thread overview]
Message-ID: <20150410143726.GD3057@linutronix.de> (raw)
In-Reply-To: <55241851.7060704@colorfullife.com>
This patch moves the wakeup_process() invocation so it is not done under
the info->lock. With this change, the waiter is woken up once it is
"ready" which means its state is STATE_READY and it does not need to loop
on SMP if it is still in STATE_PENDING.
In the timeout case we still need to grab the info->lock to verify the state.
This change should also avoid the introduction of preempt_disable() in
-RT which avoids a busy-loop which pools for the STATE_PENDING -> STATE_READY
change if the waiter has a higher priority compared to the waker.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: update / correct the comment of how the algorithm works now.
* Manfred Spraul | 2015-04-07 19:48:01 [+0200]:
>No. With your change, ipc/sem.c and ipc/msg.c use different algorithms.
>Please update the comment and describe the new approach:
>
>Current approach:
>- set pointer to message
>- STATE_PENDING
>- wake_up_process()
>- STATE_READY
> (now the receiver can continue)
>
>New approach:
>- set pointer to message
>- get_task_struct
>- STATE_READY
> (now the receiver can continue, e.g. woken up due to an unrelated
>SIGKILL)
>- wake_up_process()
>- put_task_struct()
Updated. Does it meet your expectation?
>
>>+ if (r_sender) {
>>+ wake_up_process(r_sender);
>>+ put_task_struct(r_sender);
>>+ }
>> ret = 0;
>Could you double-check that it is safe to call wake_up_process on a
>killed and reaped thread, only with a get_task_struct reference?
tglx answered that part.
>And: please test it, too. (patch the kernel so that you can trigger
>this case).
Why patch? Isn't this triggered if you have a reader waiting and you
send a message?
ipc/mqueue.c | 51 ++++++++++++++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 17 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 7635a1cf99f3..bfdc2219a470 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -47,7 +47,6 @@
#define RECV 1
#define STATE_NONE 0
-#define STATE_PENDING 1
#define STATE_READY 2
struct posix_msg_tree_node {
@@ -577,9 +576,6 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr,
time = schedule_hrtimeout_range_clock(timeout, 0,
HRTIMER_MODE_ABS, CLOCK_REALTIME);
- while (ewp->state == STATE_PENDING)
- cpu_relax();
-
if (ewp->state == STATE_READY) {
retval = 0;
goto out;
@@ -909,9 +905,14 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
* bypasses the message array and directly hands the message over to the
* receiver.
* The receiver accepts the message and returns without grabbing the queue
- * spinlock. Therefore an intermediate STATE_PENDING state and memory barriers
- * are necessary. The same algorithm is used for sysv semaphores, see
- * ipc/sem.c for more details.
+ * spinlock. The used algorithm is different from sysv semaphores (ipc/sem.c):
+ * - set pointer to message
+ * - hold a reference of the task to be woken up
+ * - update its state (to STATE_READY). Now the receiver can continue.
+ * - wake up the process after the lock is dropped. Should the process wake up
+ * before this wakeup (due to a timeout or a signal) it will either see
+ * STATE_READY and continue or acquire the lock to check the sate again.
+ * - put the reference to task to be woken up.
*
* The same algorithm is used for senders.
*/
@@ -919,36 +920,41 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
/* pipelined_send() - send a message directly to the task waiting in
* sys_mq_timedreceive() (without inserting message into a queue).
*/
-static inline void pipelined_send(struct mqueue_inode_info *info,
+static struct task_struct *pipelined_send(struct mqueue_inode_info *info,
struct msg_msg *message,
struct ext_wait_queue *receiver)
{
+ struct task_struct *r_task;
+
receiver->msg = message;
list_del(&receiver->list);
- receiver->state = STATE_PENDING;
- wake_up_process(receiver->task);
+ r_task = receiver->task;
+ get_task_struct(r_task);
smp_wmb();
receiver->state = STATE_READY;
+ return r_task;
}
/* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
* gets its message and put to the queue (we have one free place for sure). */
-static inline void pipelined_receive(struct mqueue_inode_info *info)
+static struct task_struct *pipelined_receive(struct mqueue_inode_info *info)
{
+ struct task_struct *r_sender;
struct ext_wait_queue *sender = wq_get_first_waiter(info, SEND);
if (!sender) {
/* for poll */
wake_up_interruptible(&info->wait_q);
- return;
+ return NULL;
}
if (msg_insert(sender->msg, info))
- return;
+ return NULL;
list_del(&sender->list);
- sender->state = STATE_PENDING;
- wake_up_process(sender->task);
+ r_sender = sender->task;
+ get_task_struct(r_sender);
smp_wmb();
sender->state = STATE_READY;
+ return r_sender;
}
SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
@@ -961,6 +967,7 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
struct ext_wait_queue *receiver;
struct msg_msg *msg_ptr;
struct mqueue_inode_info *info;
+ struct task_struct *r_task = NULL;
ktime_t expires, *timeout = NULL;
struct timespec ts;
struct posix_msg_tree_node *new_leaf = NULL;
@@ -1049,7 +1056,7 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
} else {
receiver = wq_get_first_waiter(info, RECV);
if (receiver) {
- pipelined_send(info, msg_ptr, receiver);
+ r_task = pipelined_send(info, msg_ptr, receiver);
} else {
/* adds message to the queue */
ret = msg_insert(msg_ptr, info);
@@ -1062,6 +1069,10 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
}
out_unlock:
spin_unlock(&info->lock);
+ if (r_task) {
+ wake_up_process(r_task);
+ put_task_struct(r_task);
+ }
out_free:
if (ret)
free_msg(msg_ptr);
@@ -1149,14 +1160,20 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
msg_ptr = wait.msg;
}
} else {
+ struct task_struct *r_sender;
+
msg_ptr = msg_get(info);
inode->i_atime = inode->i_mtime = inode->i_ctime =
CURRENT_TIME;
/* There is now free space in queue. */
- pipelined_receive(info);
+ r_sender = pipelined_receive(info);
spin_unlock(&info->lock);
+ if (r_sender) {
+ wake_up_process(r_sender);
+ put_task_struct(r_sender);
+ }
ret = 0;
}
if (ret == 0) {
--
2.1.4
next prev parent reply other threads:[~2015-04-10 14:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-07 15:03 improve futex on -RT by avoiding the double wake-up Sebastian Andrzej Siewior
2015-04-07 15:03 ` [PATCH 1/3] futex: avoid double wake up in PI futex wait / wake on -RT Sebastian Andrzej Siewior
2015-04-07 18:41 ` Thomas Gleixner
2015-04-10 14:42 ` [PATCH 1/3 v2] " Sebastian Andrzej Siewior
2015-04-07 15:03 ` [PATCH 2/3] futex: avoid double wake up in futex_wake() " Sebastian Andrzej Siewior
2015-04-07 19:47 ` Thomas Gleixner
2015-04-10 16:11 ` [PATCH 2/3 v2] " Sebastian Andrzej Siewior
2015-04-13 3:02 ` Davidlohr Bueso
2015-04-16 5:09 ` Davidlohr Bueso
2015-04-16 9:19 ` Thomas Gleixner
2015-04-16 10:16 ` Peter Zijlstra
2015-04-16 10:49 ` Thomas Gleixner
2015-04-16 14:42 ` Davidlohr Bueso
2015-04-16 15:54 ` Peter Zijlstra
2015-04-16 16:22 ` Davidlohr Bueso
2015-04-07 15:03 ` [PATCH 3/3] ipc/mqueue: remove STATE_PENDING Sebastian Andrzej Siewior
2015-04-07 17:48 ` Manfred Spraul
2015-04-07 18:28 ` Thomas Gleixner
2015-04-10 14:37 ` Sebastian Andrzej Siewior [this message]
2015-04-23 22:18 ` [PATCH v2] " Thomas Gleixner
2015-04-28 3:24 ` Davidlohr Bueso
2015-04-28 12:37 ` Peter Zijlstra
2015-04-28 16:36 ` Davidlohr Bueso
2015-04-28 16:43 ` Peter Zijlstra
2015-04-28 16:59 ` Davidlohr Bueso
2015-04-29 19:44 ` Manfred Spraul
2015-04-30 18:46 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150410143726.GD3057@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=darren@dvhart.com \
--cc=dave@stgolabs.net \
--cc=fredrik.markstrom@windriver.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).