public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@google.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Pavel Machek <pavel@suse.cz>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@osdl.org>
Subject: Re: [RFC][PATCH] swap over NBD
Date: Fri, 07 Jul 2006 11:44:11 -0700	[thread overview]
Message-ID: <44AEAB7B.1060501@google.com> (raw)
In-Reply-To: <1152282398.12271.30.camel@taijtu>

Hi Peter,

Peter Zijlstra wrote:
> Hi,
> 
> With this patch I cannot manage to deadlock the kernel with swapping
> over NBD. That is, using the now forced NOOP io-sched for the NBD queue.

But please do not propose this patch for inclusion without learning the
real reason for the deadlock.  Does it go back to deadlocking if you run
with NOOP, but just alloc a page before the sync_page and free it after?

> If another io-sched (say CFQ) is used (even with the coalescence limited
> to PAGE_SIZE) the kernel very quickly locks up with the faulting process
> stuck in D state like so:
> 
> 08927a14:  [<0805f8fc>] switch_to_skas+0x3c/0x90
> 08927a2c:  [<0805ba59>] _switch_to+0x49/0xa0
> 08927a50:  [<081fbb02>] schedule+0x2c2/0x590
> 08927aa0:  [<081fc67e>] io_schedule+0xe/0x20
> 08927aa8:  [<080a49c9>] sync_page+0x39/0x50
> 08927ab4:  [<081fc939>] __wait_on_bit_lock+0x49/0x70
> 08927ad4:  [<080a5213>] __lock_page+0x73/0x80
> 08927b04:  [<080b40ac>] do_swap_page+0x1ac/0x300
> 08927b40:  [<080b4867>] __handle_mm_fault+0x107/0x280
> 08927b78:  [<0805e6d2>] handle_page_fault+0x142/0x280
> 08927ba8:  [<0805ea07>] segv+0x177/0x2e0
> 08927c5c:  [<0805e87f>] segv_handler+0x6f/0x80
> 08927c80:  [<080750e3>] user_signal+0x53/0x70
> 08927c98:  [<08074470>] userspace+0x1d0/0x370
> 08927cf8:  [<0805f9fb>] new_thread_handler+0xab/0xc0
> 08927d1c:  [<ffffe420>] _etext+0xf7dfe403/0x0
>         
> and occasionally other processes stuck in D state like so:
> 
> 089d7810:  [<0805f8fc>] switch_to_skas+0x3c/0x90
> 089d7828:  [<0805ba59>] _switch_to+0x49/0xa0
> 089d784c:  [<081fbb02>] schedule+0x2c2/0x590
> 089d789c:  [<081fc67e>] io_schedule+0xe/0x20
> 089d78a4:  [<0816e647>] get_request_wait+0xf7/0x120
> 089d78e8:  [<0816f24f>] __make_request+0x9f/0x430
> 089d792c:  [<0816f771>] generic_make_request+0xf1/0x180
> 089d7970:  [<0816f853>] submit_bio+0x53/0x140
> 089d79d0:  [<080bb60e>] swap_writepage+0x9e/0xd0
> 089d79f4:  [<080aed89>] pageout+0xa9/0x140
> 089d7a3c:  [<080af19a>] shrink_page_list+0x2ba/0x440
> 089d7abc:  [<080af4d4>] shrink_inactive_list+0xb4/0x310
> 089d7b50:  [<080afbfb>] shrink_zone+0x9b/0x100
> 089d7b78:  [<080b0154>] balance_pgdat+0x264/0x390
> 089d7be0:  [<080b030d>] kswapd+0x8d/0xc0
> 089d7c18:  [<08095a86>] kthread+0xb6/0xc0
> 089d7c48:  [<0806f8e7>] run_kernel_thread+0x37/0x60
> 089d7cf8:  [<0805f9e1>] new_thread_handler+0x91/0xc0
> 089d7d1c:  [<ffffe420>] _etext+0xf7dfe403/0x0
> 
> I must admit to not having figured out why the faulting process gets
> stuck on sync_page() like it does. What I initially thought was that it
> might be waiting to lock a page stuck in an elevator plug, however it
> does look like sync_page() calls __generic_unplug_device() after some
> indirection.

This is not about lock_page.  Whatever was supposed to wake up the
process sleeping in io_schedule didn't, perhaps because it was trying
to allocate memory at the time, and still is.

> Also note that with all recorded lockups the free page count is larger
> than the min watermark, and sometimes even larger than the high
> watermark.

The allocating process is probably in shrink_caches which doesn't check
the free page counts, it uses some other mystical criterea to decide
when to stop, so this in itself does not let out the possibility of a
memory allocation deadlock.

What was supposed to do the wakeup, is this supposed to be driven by
an endio completion?  Theory: the io completion wakes up something that
was supposed to wake up the process in IO wait, but which tried to
allocate memory and got blocked forever.

Above we can see what kswapd is doing, it is deadlocked as I would
expect.  Is it sometimes not deadlocked?  Sleeping instead?

> (These traces were obtained using UML, real machines give similar
> results)
> 
> Other than this as of yet unexplained behaviour, the patch does make
> sense in that:
>  - IO scheduling is best done on the server end of NBD;
>  - not coalescing requests keeps the memory needs constant.
>    (without this change one very quickly ends up in the classical
>     VM deadlock)

All true, but.  There is a special place in hell for those who make
bugs go away without knowing what caused them in the first place.  I
am not implying that you are such a person of course.

Regards,

Daniel

      reply	other threads:[~2006-07-07 18:45 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-07 14:26 [RFC][PATCH] swap over NBD Peter Zijlstra
2006-07-07 18:44 ` Daniel Phillips [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44AEAB7B.1060501@google.com \
    --to=phillips@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox