All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>,
	linux-rt-users@vger.kernel.org, kvm@vger.kernel.org,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock
Date: Thu, 27 Jun 2013 14:43:02 +0300	[thread overview]
Message-ID: <20130627114302.GK18508@redhat.com> (raw)
In-Reply-To: <51CC2435.7080204@redhat.com>

On Thu, Jun 27, 2013 at 01:38:29PM +0200, Paolo Bonzini wrote:
> Il 27/06/2013 13:09, Gleb Natapov ha scritto:
> > On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
> >> In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"),
> > I am copying Jan, the author of the patch. Commit message says:
> > "Code under this lock requires non-preemptibility", but which code
> > exactly is this? Is this still true?
> 
> hardware_enable_nolock/hardware_disable_nolock does.
> 
I suspected this will be the answer and prepared another question :)
>From a glance kvm_lock is used to protect those just to avoid creating
separate lock, so why not create raw one to protect them and change
kvm_lock to non raw again. Admittedly I haven't looked too close into
this yet.

> Paolo
> 
> >> the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
> >> function tries to grab the (non-raw) mmu_lock within the scope of
> >> the raw locked kvm_lock being held.  This leads to the following:
> >>
> >> BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
> >> in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
> >> Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]
> >>
> >> Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
> >> Call Trace:
> >>  [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
> >>  [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
> >>  [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
> >>  [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
> >>  [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
> >>  [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
> >>  [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
> >>  [<ffffffff811185bf>] kswapd+0x18f/0x490
> >>  [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
> >>  [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
> >>  [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
> >>  [<ffffffff81060d2b>] kthread+0xdb/0xe0
> >>  [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
> >>  [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
> >>  [<ffffffff81060c50>] ? __init_kthread_worker+0x
> >>
> >> Since we only use the lock for protecting the vm_list, once we've
> >> found the instance we want, we can shuffle it to the end of the
> >> list and then drop the kvm_lock before taking the mmu_lock.  We
> >> can do this because after the mmu operations are completed, we
> >> break -- i.e. we don't continue list processing, so it doesn't
> >> matter if the list changed around us.
> >>
> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> >> ---
> >>
> >> [Note1: do double check that this solution makes sense for the
> >>  mainline kernel; consider this an RFC patch that does want a
> >>  review from people in the know.]
> >>
> >> [Note2: you'll need to be running a preempt-rt kernel to actually
> >>  see this.  Also note that the above patch is against linux-next.
> >>  Alternate solutions welcome ; this seemed to me the obvious fix.]
> >>
> >>  arch/x86/kvm/mmu.c | 12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >> index 748e0d8..db93a70 100644
> >> --- a/arch/x86/kvm/mmu.c
> >> +++ b/arch/x86/kvm/mmu.c
> >> @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> >>  {
> >>  	struct kvm *kvm;
> >>  	int nr_to_scan = sc->nr_to_scan;
> >> +	int found = 0;
> >>  	unsigned long freed = 0;
> >>  
> >>  	raw_spin_lock(&kvm_lock);
> >> @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> >>  			continue;
> >>  
> >>  		idx = srcu_read_lock(&kvm->srcu);
> >> +
> >> +		list_move_tail(&kvm->vm_list, &vm_list);
> >> +		found = 1;
> >> +		/* We can't be holding a raw lock and take non-raw mmu_lock */
> >> +		raw_spin_unlock(&kvm_lock);
> >> +
> >>  		spin_lock(&kvm->mmu_lock);
> >>  
> >>  		if (kvm_has_zapped_obsolete_pages(kvm)) {
> >> @@ -4370,11 +4377,12 @@ unlock:
> >>  		 * per-vm shrinkers cry out
> >>  		 * sadness comes quickly
> >>  		 */
> >> -		list_move_tail(&kvm->vm_list, &vm_list);
> >>  		break;
> >>  	}
> >>  
> >> -	raw_spin_unlock(&kvm_lock);
> >> +	if (!found)
> >> +		raw_spin_unlock(&kvm_lock);
> >> +
> >>  	return freed;
> >>  
> >>  }
> >> -- 
> >> 1.8.1.2
> > 
> > --
> > 			Gleb.
> > 

--
			Gleb.

  reply	other threads:[~2013-06-27 11:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-25 22:34 [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock Paul Gortmaker
2013-06-26  8:10 ` Paolo Bonzini
2013-06-26 18:11   ` [PATCH-next v2] " Paul Gortmaker
2013-06-26 21:59     ` Paolo Bonzini
2013-06-27  2:56       ` Paul Gortmaker
2013-06-27 10:22         ` Paolo Bonzini
2013-06-27 11:09 ` [PATCH-next] " Gleb Natapov
2013-06-27 11:38   ` Paolo Bonzini
2013-06-27 11:43     ` Gleb Natapov [this message]
2013-06-27 11:54       ` Paolo Bonzini
2013-06-27 12:16     ` Jan Kiszka
2013-06-27 12:32       ` Gleb Natapov
2013-06-27 13:00         ` Paolo Bonzini
2013-06-27 13:01           ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130627114302.GK18508@redhat.com \
    --to=gleb@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.