From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1757443AbbFPXFb (ORCPT <rfc822;w@1wt.eu>);
	Tue, 16 Jun 2015 19:05:31 -0400
Received: from mx1.redhat.com ([209.132.183.28]:60792 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751555AbbFPXFX (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 16 Jun 2015 19:05:23 -0400
Date: Wed, 17 Jun 2015 01:04:14 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Al Viro <viro@ZenIV.linux.org.uk>,
        Andrew Morton <akpm@linux-foundation.org>,
        Benjamin LaHaise <bcrl@kvack.org>, Jeff Moyer <jmoyer@redhat.com>
Cc: linux-aio@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH 0/3] aio: ctx->dead cleanups
Message-ID: <20150616230414.GA15776@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Al, please help. We are trying to backport some aio fixes and I am
absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
race".


Firstly, I simply can't understand what exactly it tries to fix. OK,
aio_free_ring() can race with kill and we can remap the soon-to-be-killed
ctx. So what? kill_ioctx() will the the correct (already re-mapped)
ctx->mmap_base after it drops mm->ioctx_lock.

So it seems to me we only need this change to ensure that move_vma() can
not succeed if ctx was already removed from ->ioctx_table, or, if we race
with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
ensure that move_vma()->aio_ring_mmap() can not race with
vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this race
doesn't look really bad. The kernel can't crash, just the application can
fool itself.

But I guess I missed something, and I'd like to know what I have missed.
Could you explain?


Also. The change in move_vma() looks "obviously wrong". Don't we need
something like the patch at the end to ensure we do not "leak" new_vma
or I am totally confused?


But to me the main problem is atomic_read(ctx->dead) in aio_remap().
I mean, it complicates the backporting, and it looks unnecessary and
confusing. See the 1st patch.

Please review, I do not know how to test this.

Oleg.

--- x/mm/mremap.c
+++ x/mm/mremap.c
@@ -275,6 +275,8 @@ static unsigned long move_vma(struct vm_
 	moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
 				     need_rmap_locks);
 	if (moved_len < old_len) {
+		err = -ENOMEM;
+xxx:
 		/*
 		 * On error, move entries back from new area to old,
 		 * which will succeed since page tables still there,
@@ -285,14 +287,11 @@ static unsigned long move_vma(struct vm_
 		vma = new_vma;
 		old_len = new_len;
 		old_addr = new_addr;
-		new_addr = -ENOMEM;
+		new_addr = err;
 	} else if (vma->vm_file && vma->vm_file->f_op->mremap) {
 		err = vma->vm_file->f_op->mremap(vma->vm_file, new_vma);
-		if (err < 0) {
-			move_page_tables(new_vma, new_addr, vma, old_addr,
-					 moved_len, true);
-			return err;
-		}
+		if (err < 0)
+			goto xxx;
 	}
 
 	/* Conceal VM_ACCOUNT so old reservation is not undone */