public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, Nick Piggin <npiggin@suse.de>,
	Jiri Kosina <jkosina@suse.cz>, Yinghai Lu <yhlu.kernel@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: [PATCH percpu#for-linus] percpu: fix possible deadlock via irq lock inversion
Date: Mon, 09 Nov 2009 14:46:52 +0900	[thread overview]
Message-ID: <4AF7ACCC.2050208@kernel.org> (raw)
In-Reply-To: <1257610412.4108.3.camel@laptop>

Lockdep caught the following irq lock inversion which can lead to
deadlock.

 =========================================================
 [ INFO: possible irq lock inversion dependency detected ]
 2.6.32-rc5-tip-04815-g12f0f93-dirty #745
 ---------------------------------------------------------
 hub 1-3:1.0: state 7 ports 2 chg 0000 evt 0004
 ksoftirqd/65/199 just changed the state of lock:
  (pcpu_lock){..-...}, at: [<ffffffff81130e04>] free_percpu+0x38/0x104
 but this lock took another, SOFTIRQ-unsafe lock in the past:
  (vmap_area_lock){+.+...}

 and interrupts could create inverse lock ordering between them.

This happens because pcpu_lock is allowed to be acquired from irq
context for free_percpu() path and alloc_percpu() path may call
vfree() with pcpu_lock held.  As vmap_area_lock isn't irq safe, if an
IRQ occurs while vmap_area_lock is held and the irq handler calls
free_percpu(), locking order inversion occurs.

As the nesting only occurs in the alloc path which isn't allowed to be
called from irq context, A->B->A deadlock won't occur but A->B B->A
deadlock is still possible.

The only place where vmap_area_lock is nested under pcpu_lock is while
extending area_map to free old map.  Break the locking order by
temporarily releasing pcpu_lock when freeing old map.  This is safe to
do as allocation path is protected with outer pcpu_alloc_mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
If nobody objects, I'll push it to Linus tomorrow.  Thanks.

 mm/percpu.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index d907971..30cd343 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -372,7 +372,7 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
 static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 {
 	int new_alloc;
-	int *new;
+	int *new, *old = NULL;
 	size_t size;

 	/* has enough? */
@@ -407,10 +407,23 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 	 * one of the first chunks and still using static map.
 	 */
 	if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
-		pcpu_mem_free(chunk->map, size);
+		old = chunk->map;

 	chunk->map_alloc = new_alloc;
 	chunk->map = new;
+
+	/*
+	 * pcpu_mem_free() might end up calling vfree() which uses
+	 * IRQ-unsafe lock and thus can't be called with pcpu_lock
+	 * held.  Release and reacquire pcpu_lock if old map needs to
+	 * be freed.
+	 */
+	if (old) {
+		spin_unlock_irqrestore(&pcpu_lock, *flags);
+		pcpu_mem_free(old, size);
+		spin_lock_irqsave(&pcpu_lock, *flags);
+	}
+
 	return 0;
 }


  reply	other threads:[~2009-11-09  5:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <86802c440911041008q4969b9bdk15b4598c40bb84bd@mail.gmail.com>
     [not found] ` <4AF25FC7.4000502@kernel.org>
     [not found]   ` <20091105082102.GA2870@elte.hu>
     [not found]     ` <4AF28D7A.6020209@kernel.org>
2009-11-05 14:31       ` irq lock inversion Jiri Kosina
2009-11-06  5:53         ` Tejun Heo
2009-11-06  7:17           ` Ingo Molnar
2009-11-06  7:45             ` Tejun Heo
2009-11-06  7:58               ` Ingo Molnar
2009-11-06  8:24                 ` Tejun Heo
2009-11-06  8:40                   ` Ingo Molnar
2009-11-06  8:52                     ` Tejun Heo
2009-11-06 16:08                       ` Christoph Lameter
2009-11-06 16:38                         ` Tejun Heo
2009-11-06 17:03                           ` Christoph Lameter
2009-11-07 16:13                             ` Peter Zijlstra
2009-11-09  5:46                               ` Tejun Heo [this message]
2009-11-06  9:59             ` Jens Axboe
2009-11-08  9:38               ` Ingo Molnar
2009-11-09 15:34                 ` Jens Axboe
2009-11-09 15:45                   ` Ingo Molnar
2009-11-09 15:49                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF7ACCC.2050208@kernel.org \
    --to=tj@kernel.org \
    --cc=cl@linux-foundation.org \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox