linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Rafael Aquini <aquini@redhat.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
Date: Mon, 4 Jan 2016 09:27:47 +0900	[thread overview]
Message-ID: <20160104002747.GA31090@blaptop.local> (raw)
In-Reply-To: <20160101102756-mutt-send-email-mst@redhat.com>

On Fri, Jan 01, 2016 at 11:36:13AM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> > In balloon_page_dequeue, pages_lock should cover the loop
> > (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> > be isolated by compaction and then list_del by isolation could
> > poison the page->lru.{prev,next} so the loop finally could
> > access wrong address like this. This patch fixes the bug.
> > 
> > general protection fault: 0000 [#1] SMP
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> > RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> > RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> > RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> > RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> > R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> > R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> > FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> > Stack:
> >  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
> >  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
> >  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> > Call Trace:
> >  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
> >  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
> >  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
> >  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
> >  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
> >  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
> >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> >  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
> >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> > RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> >  RSP <ffff8800a7fefdc0>
> > ---[ end trace 43cf28060d708d5f ]---
> > Kernel panic - not syncing: Fatal exception
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> > 
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  mm/balloon_compaction.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > index d3116be5a00f..300117f1a08f 100644
> > --- a/mm/balloon_compaction.c
> > +++ b/mm/balloon_compaction.c
> > @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  	bool dequeued_page;
> >  
> >  	dequeued_page = false;
> > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> >  		/*
> >  		 * Block others from accessing the 'page' while we get around
> > @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  				continue;
> >  			}
> >  #endif
> > -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >  			balloon_page_delete(page);
> >  			__count_vm_event(BALLOON_DEFLATE);
> > -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >  			unlock_page(page);
> >  			dequeued_page = true;
> >  			break;
> >  		}
> >  	}
> > +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >  
> >  	if (!dequeued_page) {
> >  		/*
> 
> I think this will cause deadlocks.
> 
> pages_lock now nests within page lock, balloon_page_putback
> nests them in the reverse order.

In balloon_page_dequeu, we used trylock so I don't think it's
deadlock.

> 
> Did you test this with lockdep? You really should for
> locking changes, and I'd expect it to warn about this.

I did but I don't see any warning.

> 
> Also, there's another issue there I think: after isolation page could
> also get freed before we try to lock it.

If a page was isolated, the page shouldn't stay b_dev_info->pages
list so balloon_page_dequeue cannot see the page.
Am I missing something?

> 
> We really must take a page reference before touching
> the page.
> 
> I think we need something like the below to fix this issue.
> Could you please try this out, and send Tested-by?
> I will repost as a proper patch if this works for you.

If I missed something, I am happy to retest and report the result
when I go to the office.

Thanks.

> 
> 
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index d3116be..66d69c5 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>   */
>  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  {
> -	struct page *page, *tmp;
> +	struct page *page;
>  	unsigned long flags;
>  	bool dequeued_page;
> +	LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
>  
>  	dequeued_page = false;
> -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> +	/*
> +	 * We need to go over b_dev_info->pages and lock each page,
> +	 * but b_dev_info->pages_lock must nest within page lock.
> +	 *
> +	 * To make this safe, remove each page from b_dev_info->pages list
> +	 * under b_dev_info->pages_lock, then drop this lock. Once list is
> +	 * empty, re-add them also under b_dev_info->pages_lock.
> +	 */
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	while (!list_empty(&b_dev_info->pages)) {
> +		page = list_first_entry(&b_dev_info->pages, typeof(*page), lru);
> +		/* move to processed list to avoid going over it another time */
> +		list_move(&page->lru, &processed);
> +
> +		if (!get_page_unless_zero(page))
> +			continue;
> +		/*
> +		 * pages_lock nests within page lock,
> +		 * so drop it before trylock_page
> +		 */
> +		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
>  		/*
>  		 * Block others from accessing the 'page' while we get around
>  		 * establishing additional references and preparing the 'page'
> @@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			if (!PagePrivate(page)) {
>  				/* raced with isolation */
>  				unlock_page(page);
> +				put_page(page);
>  				continue;
>  			}
>  #endif
> @@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			__count_vm_event(BALLOON_DEFLATE);
>  			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>  			unlock_page(page);
> +			put_page(page);
>  			dequeued_page = true;
>  			break;
>  		}
> +		put_page(page);
> +		spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>  	}
>  
> +	/* re-add remaining entries */
> +	list_splice(&processed, &b_dev_info->pages);
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
>  	if (!dequeued_page) {
>  		/*
>  		 * If we are unable to dequeue a balloon page because the page

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-01-04  0:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-27 23:35 [PATCH 1/2] virtio_balloon: fix race by fill and leak Minchan Kim
2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
2015-12-27 23:36   ` Rafael Aquini
2016-01-01  9:36   ` Michael S. Tsirkin
2016-01-04  0:27     ` Minchan Kim [this message]
2016-01-10 21:40       ` Michael S. Tsirkin
2016-01-10 23:54         ` Minchan Kim
2016-01-08 19:56     ` Rafael Aquini
2016-01-08 23:43       ` Minchan Kim
2016-01-09 21:43       ` Michael S. Tsirkin
2016-01-09 23:03         ` Rafael Aquini
2015-12-27 23:36 ` [PATCH 1/2] virtio_balloon: fix race by fill and leak Rafael Aquini
2016-01-01  8:26 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160104002747.GA31090@blaptop.local \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).