linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Trond Myklebust <trond.myklebust@primarydata.com>,
	Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH] SUNRPC: restore fair scheduling to priority queues.
Date: Wed, 16 Dec 2015 10:44:01 +1100	[thread overview]
Message-ID: <87twnjb7lq.fsf@notabene.neil.brown.name> (raw)

[-- Attachment #1: Type: text/plain, Size: 4038 bytes --]


Commit: c05eecf63610 ("SUNRPC: Don't allow low priority tasks to pre-empt higher priority ones")

removed the 'fair scheduling' feature from SUNRPC priority queues.
This feature caused problems for some queues (send queue and session slot queue)
but is still needed for others, particularly the tcp slot queue.

Without fairness, reads (priority 1) can starve background writes
(priority 0) so a streaming read can cause writeback to block
indefinitely.  This is not easy to measure with default settings as
the current slot table size is much larger than the read-ahead size.
However if the slot-table size is reduced (seen when backporting to
older kernels with a limited size) the problem is easily demonstrated.

This patch conditionally restores fair scheduling.  It is now the
default unless rpc_sleep_on_priority() is called directly.  Then the
queue switches to strict priority observance.

As that function is called for both the send queue and the session
slot queue and not for any others, this has exactly the desired
effect.

The "count" field that was removed by the previous patch is restored.
A value for '255' means "strict priority queuing, no fair queuing".
Any other value is a could of owners to be processed before switching
to a different priority level, just like before.

Signed-off-by: NeilBrown <neilb@suse.com>
---

It is quite possible that you won't like the overloading of
rpc_sleep_on_priority() to disable fair-scheduling and would prefer an
extra arg to rpc_init_priority_wait_queue().  I can do it that way if
you like.
NeilBrown


 include/linux/sunrpc/sched.h |  1 +
 net/sunrpc/sched.c           | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index d703f0ef37d8..985efe8d7e26 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -184,6 +184,7 @@ struct rpc_wait_queue {
 	pid_t			owner;			/* process id of last task serviced */
 	unsigned char		maxpriority;		/* maximum priority (0 if queue is not a priority queue) */
 	unsigned char		priority;		/* current priority */
+	unsigned char		count;			/* # task groups remaining to be serviced */
 	unsigned char		nr;			/* # tasks remaining for cookie */
 	unsigned short		qlen;			/* total # tasks waiting in queue */
 	struct rpc_timer	timer_list;
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 73ad57a59989..e8fcd4f098bb 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -117,6 +117,8 @@ static void rpc_set_waitqueue_priority(struct rpc_wait_queue *queue, int priorit
 		rpc_rotate_queue_owner(queue);
 		queue->priority = priority;
 	}
+	if (queue->count != 255)
+		queue->count = 1 << (priority * 2);
 }
 
 static void rpc_set_waitqueue_owner(struct rpc_wait_queue *queue, pid_t pid)
@@ -144,8 +146,10 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue,
 	INIT_LIST_HEAD(&task->u.tk_wait.links);
 	if (unlikely(queue_priority > queue->maxpriority))
 		queue_priority = queue->maxpriority;
-	if (queue_priority > queue->priority)
-		rpc_set_waitqueue_priority(queue, queue_priority);
+	if (queue->count == 255) {
+		if (queue_priority > queue->priority)
+			rpc_set_waitqueue_priority(queue, queue_priority);
+	}
 	q = &queue->tasks[queue_priority];
 	list_for_each_entry(t, q, u.tk_wait.list) {
 		if (t->tk_owner == task->tk_owner) {
@@ -401,6 +405,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *q, struct rpc_task *task,
 	 * Protect the queue operations.
 	 */
 	spin_lock_bh(&q->lock);
+	q->count = 255;
 	__rpc_sleep_on_priority(q, task, action, priority - RPC_PRIORITY_LOW);
 	spin_unlock_bh(&q->lock);
 }
@@ -478,7 +483,8 @@ static struct rpc_task *__rpc_find_next_queued_priority(struct rpc_wait_queue *q
 		/*
 		 * Check if we need to switch queues.
 		 */
-		goto new_owner;
+		if (queue->count == 255 || --queue->count)
+			goto new_owner;
 	}
 
 	/*
-- 
2.6.3


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

             reply	other threads:[~2015-12-15 23:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-15 23:44 NeilBrown [this message]
2015-12-16  0:48 ` [PATCH] SUNRPC: restore fair scheduling to priority queues Trond Myklebust
2015-12-16  3:10   ` NeilBrown
2015-12-27  0:33     ` Trond Myklebust
2016-02-10  1:23       ` NeilBrown
2016-02-10  1:45         ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87twnjb7lq.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).