All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de,
	umgwanakikbuti@gmail.com, paulmck@linux.vnet.ibm.com,
	linux-kernel@vger.kernel.org, kmo@daterainc.com
Subject: Re: [PATCH 4/3] rtmutex: Avoid barrier in rt_mutex_handle_deadlock
Date: Tue, 22 Mar 2016 17:41:22 +0100	[thread overview]
Message-ID: <20160322164122.GS6344@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160322144537.GE3921@osiris>

On Tue, Mar 22, 2016 at 03:45:37PM +0100, Heiko Carstens wrote:

> Sure, looks nice and makes a lot of sense. And the text looks a bit familiar
> to me ;)
> 
> Could you provide From: and Signed-off-by: lines?

Of course, find below.

---
Subject: s390: Clarify pagefault interrupt
From: Peter Zijlstra <peterz@infradead.org>

While looking at set_task_state() users I stumbled over the s390 pfault
interrupt code.  Since Heiko provided a great explanation on how it
worked, I figured we ought to preserve this.

Also make a few little tweaks to the code to aid in readability and
explicitly comment the unusual blocking scheme.

Based-on-text-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/s390/mm/fault.c | 44 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 791a4146052c..52cc8c99e62c 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -629,6 +629,29 @@ void pfault_fini(void)
 static DEFINE_SPINLOCK(pfault_lock);
 static LIST_HEAD(pfault_list);
 
+#define PF_COMPLETE	0x0080
+
+/*
+ * The mechanism of our pfault code: if Linux is running as guest, runs a user
+ * space process and the user space process accesses a page that the host has
+ * paged out we get a pfault interrupt.
+ *
+ * This allows us, within the guest, to schedule a different process. Without
+ * this mechanism the host would have to suspend the whole virtual CPU until
+ * the page has been paged in.
+ *
+ * So when we get such an interrupt then we set the state of the current task
+ * to uninterruptible and also set the need_resched flag. Both happens within
+ * interrupt context(!). If we later on want to return to user space we
+ * recognize the need_resched flag and then call schedule().  It's not very
+ * obvious how this works...
+ *
+ * Of course we have a lot of additional fun with the completion interrupt (->
+ * host signals that a page of a process has been paged in and the process can
+ * continue to run). This interrupt can arrive on any cpu and, since we have
+ * virtual cpus, actually appear before the interrupt that signals that a page
+ * is missing.
+ */
 static void pfault_interrupt(struct ext_code ext_code,
 			     unsigned int param32, unsigned long param64)
 {
@@ -637,14 +660,14 @@ static void pfault_interrupt(struct ext_code ext_code,
 	pid_t pid;
 
 	/*
-	 * Get the external interruption subcode & pfault
-	 * initial/completion signal bit. VM stores this 
-	 * in the 'cpu address' field associated with the
-         * external interrupt. 
+	 * Get the external interruption subcode & pfault initial/completion
+	 * signal bit. VM stores this in the 'cpu address' field associated
+	 * with the external interrupt.
 	 */
 	subcode = ext_code.subcode;
 	if ((subcode & 0xff00) != __SUBCODE_MASK)
 		return;
+
 	inc_irq_stat(IRQEXT_PFL);
 	/* Get the token (= pid of the affected task). */
 	pid = param64 & LPP_PFAULT_PID_MASK;
@@ -655,8 +678,9 @@ static void pfault_interrupt(struct ext_code ext_code,
 	rcu_read_unlock();
 	if (!tsk)
 		return;
+
 	spin_lock(&pfault_lock);
-	if (subcode & 0x0080) {
+	if (subcode & PF_COMPLETE) {
 		/* signal bit is set -> a page has been swapped in by VM */
 		if (tsk->thread.pfault_wait == 1) {
 			/* Initial interrupt was faster than the completion
@@ -683,10 +707,10 @@ static void pfault_interrupt(struct ext_code ext_code,
 		/* signal bit not set -> a real page is missing. */
 		if (WARN_ON_ONCE(tsk != current))
 			goto out;
+
 		if (tsk->thread.pfault_wait == 1) {
 			/* Already on the list with a reference: put to sleep */
-			__set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-			set_tsk_need_resched(tsk);
+			goto block;
 		} else if (tsk->thread.pfault_wait == -1) {
 			/* Completion interrupt was faster than the initial
 			 * interrupt (pfault_wait == -1). Set pfault_wait
@@ -701,7 +725,11 @@ static void pfault_interrupt(struct ext_code ext_code,
 			get_task_struct(tsk);
 			tsk->thread.pfault_wait = 1;
 			list_add(&tsk->thread.list, &pfault_list);
-			__set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+block:
+			/* Since this must be a userspace fault, there
+			 * is no kernel task state to trample. Rely on the
+			 * return to userspace schedule() to block */
+			__set_current_state(TASK_UNINTERRUPTIBLE);
 			set_tsk_need_resched(tsk);
 		}
 	}

  reply	other threads:[~2016-03-22 16:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-08 18:20 [PATCH -tip 0/3] locking/rtmutex: Another crack at spin on owner Davidlohr Bueso
2016-03-08 18:20 ` [PATCH 1/3] rtmutex: Delete save_state member of struct rt_mutex Davidlohr Bueso
2016-03-08 18:20 ` [PATCH 2/3] rtmutex: Add rt_mutex_init_waiter helper Davidlohr Bueso
2016-03-14 13:16   ` Peter Zijlstra
2016-03-08 18:20 ` [PATCH 3/3] rtmutex: Reduce top-waiter blocking on a lock Davidlohr Bueso
2016-03-14 13:23   ` Peter Zijlstra
2016-03-08 22:05 ` [PATCH 4/3] rtmutex: Avoid barrier in rt_mutex_handle_deadlock Davidlohr Bueso
2016-03-14 13:40   ` Peter Zijlstra
2016-03-21 18:16     ` Davidlohr Bueso
2016-03-22 10:21       ` Peter Zijlstra
2016-03-22 11:32         ` Heiko Carstens
2016-03-22 12:20           ` Peter Zijlstra
2016-03-22 13:26             ` Heiko Carstens
2016-03-22 13:55               ` Peter Zijlstra
2016-03-22 14:45                 ` Heiko Carstens
2016-03-22 16:41                   ` Peter Zijlstra [this message]
2016-03-22 21:46                     ` Heiko Carstens
2016-03-25  2:30         ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160322164122.GS6344@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kmo@daterainc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.