All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Lalancette <clalance@redhat.com>
To: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Latency spike during page_scrub_softirq
Date: Thu, 02 Jul 2009 16:47:23 +0200	[thread overview]
Message-ID: <4A4CC87B.7090403@redhat.com> (raw)

All,
     This is a topic which has been brought up before, but is still something
that plagues certain machines.  Let me describe the test scenario, the problem
as I see it, and some of the data that I've collected.

TEST
----
The test is on a large AMD NUMA machine with 128GB of memory and 32 cpus (8 x
quad-core), memory interleaved, running RHEL-5.4 Xen (although I believe the
issue probably affects upstream Xen as well).  I install 2 RHEL-5.3 guests, one
with 32GB of memory, and one with 64GB of memory.  On the first guest, I run a
continuous ping (just out to the default gateway).  While that ping test is
running, on the dom0 I do "xm destroy <64GB_guest>".  This takes a while to
complete (as expected), but what is not expected is some huge jumps in the ping
responses on the 32GB domains.  For instance, in the test I'm currently running,
normal ping response time is ~0.5ms, but during the xm destroy of the other
domain the ping response can jump up all the way to 3000 (or more) ms.  Once the
big domain destroy is finished, everything returns to normal.

PROBLEM
-------
>From what I can tell, the problem lies in page_scrub_softirq().  As a first
test, I disabled page-scrubbing completely (obviously insecure, but just a
test).  With no page-scrubbing at all, and direct memory freeing in
free_domheap_pages(), no delays of the kind experienced in the original test
were seen.  As a second test, I implemented the page scrubbing inside
free_domheap_pages(), and again, no spikes at all were seen.

I then put things back like they were, and instrumented page_scrub_softirq().
Now, the serialize_lock at the top of the function makes sure only one CPU at a
time comes in here.  However, when I instrumented the rest of the function, I
found that when a CPU was in here doing work, it was spending 80-95% of it's
time waiting to get the page_scrub_lock (I have raw numbers, if you want to see
them).

At first I would think this was purely contention with the other page_scrub_lock
user in free_domheap_pages().  However, after changing the
spin_lock(&page_scrub_lock) into a spin_trylock() inside page_scrub_softirq(), I
 still saw the spikes in the ping test, even though my instrumentation showed I
was only waiting like 20 - 30% of the time on the spinlock.  So I can't fully
explain the rest of the spike.  Any ideas?  Other things I should probe?

SOLUTION
--------
There are a couple of solutions that I can think of:
1)  Just clear the pages inside free_domheap_pages().  I tried this with a 64GB
guest as mentioned above, and I didn't see any ill effects from doing so.  It
seems like this might actually be a valid way to go, although then a single CPU
is doing all of the work of freeing the pages (might be a problem on UP systems).
2)  Clear the pages inside free_domheap_pages(), but do some kind of yield every
once in a while.  I don't know how feasible this would be.
3)  Do a lockless FIFO between free_domheap_pages() and page_scrub_softirq()
(since that is all it really is).  While this would certainly work, it seems
like a bit of overengineering for this problem.

Other ideas?  I'm happy to try to implement these, I'm just not sure what we
would prefer to do.

-- 
Chris Lalancette

             reply	other threads:[~2009-07-02 14:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-02 14:47 Chris Lalancette [this message]
2009-07-02 15:13 ` Latency spike during page_scrub_softirq Keir Fraser
2009-07-03  7:32   ` Chris Lalancette
2009-07-03  7:48     ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A4CC87B.7090403@redhat.com \
    --to=clalance@redhat.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.