Re: 2.6.0-test9-mm2 - AIO tests still gets slab corruption

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Daniel McNeil <daniel@osdl.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-aio@kvack.org
Subject: Re: 2.6.0-test9-mm2 - AIO tests still gets slab corruption
Date: Tue, 11 Nov 2003 20:32:29 +0530	[thread overview]
Message-ID: <20031111150229.GA4345@in.ibm.com> (raw)
In-Reply-To: <20031110154232.55eb9b10.akpm@osdl.org>

On Mon, Nov 10, 2003 at 03:42:32PM -0800, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > Andrew,
> > 
> > test9-mm2 is still getting slab corruption with AIO:
> 
> Why?
> 
> > Maximal retry count.  Bytes done 0
> > Slab corruption: start=dc70f91c, expend=dc70f9eb, problemat=dc70f91c
> > Last user: [<c0192fa3>](__aio_put_req+0xbf/0x200)
> > Data: 00 01 10 00 00 02 20 00 *********6C ******************************A5
> > Next: 71 F0 2C .A3 2F 19 C0 71 F0 2C .********************
> > slab error in check_poison_obj(): cache `kiocb': object was modified after freeing
> > 
> > With suparna's retry-based-aio-dio patch, there are no kernel messages
> > and the tests do not see any uninitialized data.
> > 
> > Any reason not to add suparna's patch to -mm to fix these problems?
> 
> It relies on infrastructure which is not present in Linus's kernel.  We
> should only be interested in fixing mainline 2.6.x.
> 
> Furthermore I'd like to see the direct-vs-buffered locking fixes fully
> implemented against Linus's tree, not -mm.  They're almost there, but are
> not quite complete.  Running off and making it dependent on the retry
> infrastructure is not really helpful.
> 

It was just easier to do this in a non-kludgy way, if we used the
retry infrastructure. Here's why:

For fixing some of the cases, we run into a situation when we've 
already submitted some of the I/O as AIO (with AIO callbacks set up)
by the time we realise that we actually need to wait for that to 
complete synchronously before falling back to buffered i/o (otherwise
we can corrupt file data). 

With the retry model it is only the actual wait that occurs 
differently for AIO and Sync I/O, not the submission. So we 
can simply switch to be synchronous at the latter stage.

Having done that, though, I was actually working on trying 
to find a way to do this that could hold for the mainline as well
(i.e. without using retry infrastructure). The attached patch has 
some special casing tweaks that might do the job; it 
modifies the AIO-DIO callback to wakeup the caller synchronously 
instead of issuing an aio_complete in such situations. 

However the existing aio-dio tests do not seem to exercise some 
of those code paths, so I haven't had a chance to verify 
if it really works for that case (i.e. A single AIO-DIO request 
overwriting an allocated region followed by a hole).

The patch should apply to 2.6.0-test9-mm2.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India


--- pure-mm/fs/direct-io.c	2003-10-30 14:22:51.000000000 +0530
+++ linux-2.6.0-test9-mm2/fs/direct-io.c	2003-10-31 17:09:35.000000000 +0530
@@ -209,7 +209,7 @@
  */
 static void dio_complete(struct dio *dio, loff_t offset, ssize_t bytes)
 {
-	if (dio->end_io)
+	if (dio->end_io && dio->result)
 		dio->end_io(dio->inode, offset, bytes, dio->map_bh.b_private);
 	if (dio->needs_locking)
 		up_read(&dio->inode->i_alloc_sem);
@@ -225,8 +225,14 @@
 		if (dio->is_async) {
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
-			aio_complete(dio->iocb, dio->result, 0);
-			kfree(dio);
+			/* Complete AIO later if falling back to buffered i/o */
+			if (dio->result != -ENOTBLK) {
+				aio_complete(dio->iocb, dio->result, 0);
+				kfree(dio);
+			} else {
+				if (dio->waiter)
+					wake_up_process(dio->waiter);
+			}
 		}
 	}
 }
@@ -877,8 +883,6 @@
 	int ret2;
 	size_t bytes;
 
-	dio->is_async = !is_sync_kiocb(iocb);
-
 	dio->bio = NULL;
 	dio->inode = inode;
 	dio->rw = rw;
@@ -969,10 +973,11 @@
 		dio_bio_submit(dio);
 
 	/*
-	 * All new block allocations have been performed.  We can let i_sem
-	 * go now.
+	 * All block lookups have been performed. For READ requests
+	 * we can let i_sem go now that its achieved its purpose
+	 * of protecting us from looking up uninitialized blocks.
 	 */
-	if (dio->needs_locking)
+	if ((rw == READ) && dio->needs_locking)
 		up(&dio->inode->i_sem);
 
 	/*
@@ -982,8 +987,30 @@
 	if (dio->is_async) {
 		if (ret == 0)
 			ret = dio->result;	/* Bytes written */
+		if (ret == -ENOTBLK) {
+			/*
+			 * The request will be reissued via buffered I/O
+			 * when we return; Any I/O already issued
+			 * effectively becomes redundant.
+			 */
+			dio->result = ret;
+			dio->waiter = current;
+		}
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
+		if (ret == -ENOTBLK) {
+			/*
+			 * Wait for already issued I/O to drain out and
+			 * release its references to user-space pages
+			 * before returning to fallback on buffered I/O
+			 */
+			while (atomic_read(&dio->bio_count)) {
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				io_schedule();
+			}
+			set_current_state(TASK_RUNNING);
+			dio->waiter = NULL;
+		}
 	} else {
 		finished_one_bio(dio);
 		ret2 = dio_await_completion(dio);
@@ -1003,6 +1029,9 @@
 				ret = i_size - offset;
 		}
 		dio_complete(dio, offset, ret);
+		/* We could have also come here on an AIO file extend */
+		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
+			aio_complete(iocb, ret, 0);
 		kfree(dio);
 	}
 	return ret;
@@ -1029,6 +1058,7 @@
 	unsigned bdev_blkbits = 0;
 	unsigned blocksize_mask = (1 << blkbits) - 1;
 	ssize_t retval = -EINVAL;
+	loff_t end = offset;
 	struct dio *dio;
 	int needs_locking;
 
@@ -1047,6 +1077,7 @@
 	for (seg = 0; seg < nr_segs; seg++) {
 		addr = (unsigned long)iov[seg].iov_base;
 		size = iov[seg].iov_len;
+		end += size;
 		if ((addr & blocksize_mask) || (size & blocksize_mask))  {
 			if (bdev)
 				 blkbits = bdev_blkbits;
@@ -1081,11 +1112,17 @@
 		down_read(&inode->i_alloc_sem);
 	}
 	dio->needs_locking = needs_locking;
+	/*
+	 * For file extending writes updating i_size before data
+	 * writeouts complete can expose uninitialized blocks. So
+	 * even for AIO, we need to wait for i/o to complete before
+	 * returning in this case.
+	 */
+	dio->is_async = !is_sync_kiocb(iocb) && !((rw == WRITE) &&
+		(end > i_size_read(inode)));
 
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_blocks, end_io, dio);
-	if (needs_locking && rw == WRITE)
-		down(&inode->i_sem);
 out:
 	return retval;
 }

WARNING: multiple messages have this Message-ID (diff)

From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Daniel McNeil <daniel@osdl.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-aio@kvack.org
Subject: Re: 2.6.0-test9-mm2 - AIO tests still gets slab corruption
Date: Tue, 11 Nov 2003 20:32:29 +0530	[thread overview]
Message-ID: <20031111150229.GA4345@in.ibm.com> (raw)
In-Reply-To: <20031110154232.55eb9b10.akpm@osdl.org>

On Mon, Nov 10, 2003 at 03:42:32PM -0800, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > Andrew,
> > 
> > test9-mm2 is still getting slab corruption with AIO:
> 
> Why?
> 
> > Maximal retry count.  Bytes done 0
> > Slab corruption: start=dc70f91c, expend=dc70f9eb, problemat=dc70f91c
> > Last user: [<c0192fa3>](__aio_put_req+0xbf/0x200)
> > Data: 00 01 10 00 00 02 20 00 *********6C ******************************A5
> > Next: 71 F0 2C .A3 2F 19 C0 71 F0 2C .********************
> > slab error in check_poison_obj(): cache `kiocb': object was modified after freeing
> > 
> > With suparna's retry-based-aio-dio patch, there are no kernel messages
> > and the tests do not see any uninitialized data.
> > 
> > Any reason not to add suparna's patch to -mm to fix these problems?
> 
> It relies on infrastructure which is not present in Linus's kernel.  We
> should only be interested in fixing mainline 2.6.x.
> 
> Furthermore I'd like to see the direct-vs-buffered locking fixes fully
> implemented against Linus's tree, not -mm.  They're almost there, but are
> not quite complete.  Running off and making it dependent on the retry
> infrastructure is not really helpful.
> 

It was just easier to do this in a non-kludgy way, if we used the
retry infrastructure. Here's why:

For fixing some of the cases, we run into a situation when we've 
already submitted some of the I/O as AIO (with AIO callbacks set up)
by the time we realise that we actually need to wait for that to 
complete synchronously before falling back to buffered i/o (otherwise
we can corrupt file data). 

With the retry model it is only the actual wait that occurs 
differently for AIO and Sync I/O, not the submission. So we 
can simply switch to be synchronous at the latter stage.

Having done that, though, I was actually working on trying 
to find a way to do this that could hold for the mainline as well
(i.e. without using retry infrastructure). The attached patch has 
some special casing tweaks that might do the job; it 
modifies the AIO-DIO callback to wakeup the caller synchronously 
instead of issuing an aio_complete in such situations. 

However the existing aio-dio tests do not seem to exercise some 
of those code paths, so I haven't had a chance to verify 
if it really works for that case (i.e. A single AIO-DIO request 
overwriting an allocated region followed by a hole).

The patch should apply to 2.6.0-test9-mm2.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India


--- pure-mm/fs/direct-io.c	2003-10-30 14:22:51.000000000 +0530
+++ linux-2.6.0-test9-mm2/fs/direct-io.c	2003-10-31 17:09:35.000000000 +0530
@@ -209,7 +209,7 @@
  */
 static void dio_complete(struct dio *dio, loff_t offset, ssize_t bytes)
 {
-	if (dio->end_io)
+	if (dio->end_io && dio->result)
 		dio->end_io(dio->inode, offset, bytes, dio->map_bh.b_private);
 	if (dio->needs_locking)
 		up_read(&dio->inode->i_alloc_sem);
@@ -225,8 +225,14 @@
 		if (dio->is_async) {
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
-			aio_complete(dio->iocb, dio->result, 0);
-			kfree(dio);
+			/* Complete AIO later if falling back to buffered i/o */
+			if (dio->result != -ENOTBLK) {
+				aio_complete(dio->iocb, dio->result, 0);
+				kfree(dio);
+			} else {
+				if (dio->waiter)
+					wake_up_process(dio->waiter);
+			}
 		}
 	}
 }
@@ -877,8 +883,6 @@
 	int ret2;
 	size_t bytes;
 
-	dio->is_async = !is_sync_kiocb(iocb);
-
 	dio->bio = NULL;
 	dio->inode = inode;
 	dio->rw = rw;
@@ -969,10 +973,11 @@
 		dio_bio_submit(dio);
 
 	/*
-	 * All new block allocations have been performed.  We can let i_sem
-	 * go now.
+	 * All block lookups have been performed. For READ requests
+	 * we can let i_sem go now that its achieved its purpose
+	 * of protecting us from looking up uninitialized blocks.
 	 */
-	if (dio->needs_locking)
+	if ((rw == READ) && dio->needs_locking)
 		up(&dio->inode->i_sem);
 
 	/*
@@ -982,8 +987,30 @@
 	if (dio->is_async) {
 		if (ret == 0)
 			ret = dio->result;	/* Bytes written */
+		if (ret == -ENOTBLK) {
+			/*
+			 * The request will be reissued via buffered I/O
+			 * when we return; Any I/O already issued
+			 * effectively becomes redundant.
+			 */
+			dio->result = ret;
+			dio->waiter = current;
+		}
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
+		if (ret == -ENOTBLK) {
+			/*
+			 * Wait for already issued I/O to drain out and
+			 * release its references to user-space pages
+			 * before returning to fallback on buffered I/O
+			 */
+			while (atomic_read(&dio->bio_count)) {
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				io_schedule();
+			}
+			set_current_state(TASK_RUNNING);
+			dio->waiter = NULL;
+		}
 	} else {
 		finished_one_bio(dio);
 		ret2 = dio_await_completion(dio);
@@ -1003,6 +1029,9 @@
 				ret = i_size - offset;
 		}
 		dio_complete(dio, offset, ret);
+		/* We could have also come here on an AIO file extend */
+		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
+			aio_complete(iocb, ret, 0);
 		kfree(dio);
 	}
 	return ret;
@@ -1029,6 +1058,7 @@
 	unsigned bdev_blkbits = 0;
 	unsigned blocksize_mask = (1 << blkbits) - 1;
 	ssize_t retval = -EINVAL;
+	loff_t end = offset;
 	struct dio *dio;
 	int needs_locking;
 
@@ -1047,6 +1077,7 @@
 	for (seg = 0; seg < nr_segs; seg++) {
 		addr = (unsigned long)iov[seg].iov_base;
 		size = iov[seg].iov_len;
+		end += size;
 		if ((addr & blocksize_mask) || (size & blocksize_mask))  {
 			if (bdev)
 				 blkbits = bdev_blkbits;
@@ -1081,11 +1112,17 @@
 		down_read(&inode->i_alloc_sem);
 	}
 	dio->needs_locking = needs_locking;
+	/*
+	 * For file extending writes updating i_size before data
+	 * writeouts complete can expose uninitialized blocks. So
+	 * even for AIO, we need to wait for i/o to complete before
+	 * returning in this case.
+	 */
+	dio->is_async = !is_sync_kiocb(iocb) && !((rw == WRITE) &&
+		(end > i_size_read(inode)));
 
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_blocks, end_io, dio);
-	if (needs_locking && rw == WRITE)
-		down(&inode->i_sem);
 out:
 	return retval;
 }
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2003-11-11 14:57 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-05  6:55 2.6.0-test9-mm2 Andrew Morton
2003-11-05  6:55 ` 2.6.0-test9-mm2 Andrew Morton
2003-11-05 12:30 ` 2.6.0-test9-mm2 Alexander Hoogerhuis
2003-11-05 12:30   ` 2.6.0-test9-mm2 Alexander Hoogerhuis
2003-11-05 16:10 ` 2.6.0-test9-mm2 (compile stats) John Cherry
2003-11-05 16:10   ` John Cherry
2003-11-05 17:02 ` 2.6.0-test9-mm2 Alistair John Strachan
2003-11-05 17:02   ` 2.6.0-test9-mm2 Alistair John Strachan
2003-11-05 23:07 ` 2.6.0-test9-mm2 Martin J. Bligh
2003-11-05 23:07   ` 2.6.0-test9-mm2 Martin J. Bligh
2003-11-10 23:06 ` 2.6.0-test9-mm2 - AIO tests still gets slab corruption Daniel McNeil
2003-11-10 23:06   ` Daniel McNeil
2003-11-10 23:42   ` Andrew Morton
2003-11-10 23:42     ` Andrew Morton
2003-11-11 15:02     ` Suparna Bhattacharya [this message]
2003-11-11 15:02       ` Suparna Bhattacharya
2003-11-12 20:10       ` Daniel McNeil
2003-11-12 20:10         ` Daniel McNeil
2003-11-13 11:29         ` Suparna Bhattacharya
2003-11-13 11:29           ` Suparna Bhattacharya
2003-11-11 18:01     ` [PATCH 2.6.0-test9] AIO-ref-count.patch Daniel McNeil
2003-11-11 17:25 ` 2.6.0-test9-mm2 Daniel Drake
2003-11-11 17:25   ` 2.6.0-test9-mm2 Daniel Drake
2003-11-12  1:18   ` 2.6.0-test9-mm2 Nick Piggin
2003-11-12  1:18     ` 2.6.0-test9-mm2 Nick Piggin
2003-11-12  3:45     ` 2.6.0-test9-mm2 Mike Fedyk
2003-11-12  3:45       ` 2.6.0-test9-mm2 Mike Fedyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031111150229.GA4345@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=daniel@osdl.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.