All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Lalancette <clalance@redhat.com>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [PATCH]: Fix deadlock in mm_pin
Date: Thu, 20 Nov 2008 19:37:01 +0100	[thread overview]
Message-ID: <4925AE4D.2080906@redhat.com> (raw)
In-Reply-To: <C54B28E2.29419%keir.fraser@eu.citrix.com>

Keir Fraser wrote:
> On 20/11/08 10:31, "Chris Lalancette" <clalance@redhat.com> wrote:
> 
>> it applies to the 2.6.18 tree as well; the deadlock scenario is below.
>>
>> "After running an arbitrary workload involving network traffic for some time
>> (1-2 days), a xen guest running the 2.6.9-67 x86_64 xenU kernel locks up with
>> both vcpu's spinning at 100%.
>>
>> The problem is due to a race between the scheduler and network interrupts.  On
>> one vcpu, the scheduler takes the runqueue spinlock of the other vcpu to
>> schedule a process, and attempts to lock mm_unpinned_lock.  On the other vcpu,
>> another process is holding mm_unpinned_lock (because it is starting or
>> exiting), and is interrupted by a network interrupt.  The network interrupt
>> handler attempts to wake up the same process that the first vcpu is trying to
>> schedule, and will try to get the runqueue spinlock that the first vcpu is
>> already holding."
> 
> I don't believe that mm_unpinned_lock can ever be taken while a runqueue
> lock is already held in 2.6.18. If you can provide a call chain then I'll
> consider the patch -- but I think you'd still be screwed by the
> mm->page_table_lock (also acquired in mm_pin() code, also not IRQ safe, but
> less easy for you to go convert all the users of that lock).
> 
> You might have some backporting from 2.6.18 to do...

Arg.  I think I see what you mean.  In c/s 10343, mm_pin is moved from switch_mm
into activate_mm, which I *think* means that it is no longer called with the
runqueue lock held.  Indeed, the comment on that c/s says it removes a deadlock,
which may be the one the RHEL-4 kernel is running into.  OK, thanks for the
feedback, I'll look at backporting that code.

Chris Lalancette

      reply	other threads:[~2008-11-20 18:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-20 10:31 [PATCH]: Fix deadlock in mm_pin Chris Lalancette
2008-11-20 14:46 ` Keir Fraser
2008-11-20 18:37   ` Chris Lalancette [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4925AE4D.2080906@redhat.com \
    --to=clalance@redhat.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.