From: "J. Bruce Fields" <bfields@fieldses.org>
To: Lyle Seaman <lyleseaman@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Thread scheduling issues
Date: Thu, 14 Oct 2010 13:15:23 -0400 [thread overview]
Message-ID: <20101014171522.GA553@fieldses.org> (raw)
In-Reply-To: <AANLkTin8tXxRFA5G8c9jKLv6aeEM8YLRX54YT5GozxK+@mail.gmail.com>
On Thu, Oct 14, 2010 at 11:21:33AM -0400, Lyle Seaman wrote:
> Ok. I've been trying to figure out a performance problem with some of
> my production servers (I'm just not able to get enough simultaneous
> I/Os queued up to get decent performance out of my disk subsystem),
> and I have run into something that strikes me as odd. For reference,
> the code I'm looking at is from a 2.6.35.22 kernel, and I know there
> have been some relevant changes since then but I am not seeing the
> discussion of this in the mailing list archives. If I have to figure
> out the new source control system to look at the head-of-line code,
> say so.
>
> in sunrpc/svc_xprt.c:svc_recv() I find this:
>
> /* now allocate needed pages. If we get a failure, sleep briefly */
> 620 pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE;
> 621 for (i = 0; i < pages ; i++)
> 622 while (rqstp->rq_pages[i] == NULL) {
> 623 struct page *p = alloc_page(GFP_KERNEL);
> 624 if (!p) {
> 625 set_current_state(TASK_INTERRUPTIBLE);
> 626 if (signalled() || kthread_should_stop()) {
> 627 set_current_state(TASK_RUNNING);
> 628 return -EINTR;
> 629 }
> 630 schedule_timeout(msecs_to_jiffies(500));
> 631 }
> 632 rqstp->rq_pages[i] = p;
> 633 }
>
> First of all, 500ms is a long way from "brief". Second, it seems like
> a really bad idea to sleep while holding a contended resource.
> Shouldn't you drop those pages before sleeping? It's been a long time
> since I knew anything about the Linux vm system, but presumably the
> reason that alloc_page has failed is because there aren't 26 pages
> available*. So now I'm going to sleep while holding, say, the last 24
> free pages? This looks like deadlock city, except for the valiant
> efforts by the vm system to keep freeing pages, and other adjustments
> to vary sv_max_mesg depending on the number of threads which mean that
> in practice, deadlock is unlikely. But figuring out the interaction
> of all those other systems requires global knowledge, which is asking
> for trouble in the long run. And there are intermediate degrees of
> cascading interlocked interdependencies that result in poor
> performance which aren't technically a complete deadlock.
>
> (* I'm going to have to do some more digging here, because in my
> specific situation, vmstat generally reports between 100M and 600M
> free, so I'm admittedly not clear on why alloc_page is failing me,
> unless there is a difference between "free" and "available for nfsd".)
OK, I'm more than willing to consider proposals for improvement here,
but I'm losing the connection to your problem: how do you know that this
particular alloc_page is failing in your case, and how do you know
that's the cause of your problem?
> And then there's this, in svc_xprt_enqueue()
>
> 360 process:
> 361 /* Work out whether threads are available */
> 362 thread_avail = !list_empty(&pool->sp_threads); /* threads
> are asleep */
> 363 if (pool->sp_nwaking >= SVC_MAX_WAKING) {
> // == 5 -lws
> 364 /* too many threads are runnable and trying to wake up */
> 365 thread_avail = 0;
> 366 pool->sp_stats.overloads_avoided++;
> 367 }
That strikes me as a more likely cause of your problem. But it was
reverted in v2.6.33, and you say you're using v2.6.35.22, so there's
some confusion here?
But again we're jumping to conclusions: what's your setup, what symptoms
are you seeing, what did you try to do to fix it, and what were the
results?
> Now, the sp_nwaking counter is only decremented after that snippet of
> code earlier, so if I've got five threads that happen to all run into
> a transient problem getting the pages they need, they're going to
> sleep for 500 millis and -nothing- is going to happen.
>
> I see that this overloads_avoided branch disappeared in more recent
> kernels, was there some discussion of it that you can point me to so I
> don't have to rehash an old topic? I think that removing it actually
> increases the likelihood of deadlock.
>
> Finally, zero-copy is great, awesome, I love it. But it seems
> profligate to allocate an entire 26 pages for every operation when
> only 9% of them (at least in my workload) are writes. All the rest of
> the operations only need a small fraction of that to be preallocated.
> I don't have a concrete suggestion here, I'd have to do some tinkering
> with the code first. I'm just... saying. Maybe allocate two pages up
> front, then wait to see if the others are needed and allocate them at
> that time. If you're willing to sleep in svc_recv while holding lots
> of pages, it's no worse to sleep while holding one or two.
Could be. I think the code was trying to avoid reading an rpc request
until it was reasonably sure it would have the resources to complete it,
so we didn't end up with a thread stuck waiting on a half-received rpc.
--b.
next prev parent reply other threads:[~2010-10-14 17:15 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-14 15:21 Thread scheduling issues Lyle Seaman
2010-10-14 17:15 ` J. Bruce Fields [this message]
2010-10-15 0:09 ` Lyle Seaman
-- strict thread matches above, loose matches on Subject: below --
2010-10-14 12:38 Lyle Seaman
[not found] ` <AANLkTim975UbEmeNSSS0awLSou151rr7=DpRxxw6trdn-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-14 14:12 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101014171522.GA553@fieldses.org \
--to=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=lyleseaman@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).