All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Daniel McNeil <daniel@osdl.org>
Cc: Andrew Morton <akpm@osdl.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, "linux-aio@kvack.org" <linux-aio@kvack.org>
Subject: Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
Date: Wed, 26 Nov 2003 13:25:13 +0530	[thread overview]
Message-ID: <20031126075513.GA3902@in.ibm.com> (raw)
In-Reply-To: <1069804171.1841.23.camel@ibm-c.pdx.osdl.net>

On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> Suparna,
> 
> Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
> was helping me see problems because the the freed dio was getting
> poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> very good at catching these.

Ah I see - perhaps that explains why neither Janet nor I could
recreate the problem that you were hitting so easily. So we 
should probably try running with CONFIG_DEBUG_SLAB and
CONFIG_DEBUG_PAGEALLOC as well.

> 
> I updated your AIO fallback patch plus your AIO race plus I fixed
> the bio_count decrement fix.  This patch has all three fixes and
> it is working for me.
> 
> I fixed the bio_count race, by changing bio_list_lock into bio_lock
> and using that for all the bio fields.  I changed bio_count and
> bios_in_flight from atomics into int.  They are now proctected by
> the bio_lock.  I fixed the race, by in finished_one_bio() by
> leaving the bio_count at 1 until after the dio_complete()
> and then do the bio_count decrement and wakeup holding the bio_lock.
> 
> Take a look, give it a try, and let me know what you think.

I had been trying a slightly different kind of fix -- appended is
the updated version of the patch I last posted. It uses the bio_list_lock
to protect the dio->waiter field, which finished_one_bio sets back
to NULL after it has issued the wakeup; and the code that waits for
i/o to drain out checks the dio->waiter field instead of bio_count.
This might not seem very obvious given the nomenclature of the 
bio_list_lock, so I was holding back wondering if it could be 
improved. 

Your approach looks clearer in that sense -- its pretty unambiguous
about what lock protects what fields. The only thing that bothers me (and
this is what I was trying to avoid in my patch) is the increased
use of spin_lock_irq 's (overhead of turning interrupts off and on)
instead of simple atomic inc/dec in most places.

Thoughts ?

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India

------------------------------------

Don't access dio fields if its possible that the dio could
already have been freed asynchronously during i/o completion.
The dio->bio_list_lock protects the dio->waiter field as in
the case of synchronous i/o.

--- pure-mm3/fs/direct-io.c	2003-11-24 13:00:33.000000000 +0530
+++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-25 14:08:26.000000000 +0530
@@ -231,8 +231,17 @@
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
 			} else {
-				if (dio->waiter)
-					wake_up_process(dio->waiter);
+				struct task_struct *waiter;
+				unsigned long flags;
+
+				spin_lock_irqsave(&dio->bio_list_lock, flags);
+				waiter = dio->waiter;
+				if (waiter) {
+					dio->waiter = NULL;
+					wake_up_process(waiter);
+				}
+				spin_unlock_irqrestore(&dio->bio_list_lock, 
+					flags);
 			}
 		}
 	}
@@ -994,26 +1004,35 @@
 	 * reflect the number of to-be-processed BIOs.
 	 */
 	if (dio->is_async) {
-		if (ret == 0)
-			ret = dio->result;
-		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
+		int should_wait = 0;
+
+		if (dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
+			should_wait = 1;
 		}
+		if (ret == 0)
+			ret = dio->result;
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (dio->waiter) {
+		if (should_wait) {
+			unsigned long flags;
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
 			 * before returning to fallback on buffered I/O
 			 */
+			spin_lock_irqsave(&dio->bio_list_lock, flags);
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			while (atomic_read(&dio->bio_count)) {
+			while (dio->waiter) {
+				spin_unlock_irqrestore(&dio->bio_list_lock, 
+				flags);
 				io_schedule();
 				set_current_state(TASK_UNINTERRUPTIBLE);
+				spin_lock_irqsave(&dio->bio_list_lock, flags);
 			}
 			set_current_state(TASK_RUNNING);
-			dio->waiter = NULL;
+			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			kfree(dio);
 		}
 	} else {
 		finished_one_bio(dio);

WARNING: multiple messages have this Message-ID (diff)
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Daniel McNeil <daniel@osdl.org>
Cc: Andrew Morton <akpm@osdl.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, "linux-aio@kvack.org" <linux-aio@kvack.org>
Subject: Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
Date: Wed, 26 Nov 2003 13:25:13 +0530	[thread overview]
Message-ID: <20031126075513.GA3902@in.ibm.com> (raw)
In-Reply-To: <1069804171.1841.23.camel@ibm-c.pdx.osdl.net>

On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> Suparna,
> 
> Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
> was helping me see problems because the the freed dio was getting
> poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> very good at catching these.

Ah I see - perhaps that explains why neither Janet nor I could
recreate the problem that you were hitting so easily. So we 
should probably try running with CONFIG_DEBUG_SLAB and
CONFIG_DEBUG_PAGEALLOC as well.

> 
> I updated your AIO fallback patch plus your AIO race plus I fixed
> the bio_count decrement fix.  This patch has all three fixes and
> it is working for me.
> 
> I fixed the bio_count race, by changing bio_list_lock into bio_lock
> and using that for all the bio fields.  I changed bio_count and
> bios_in_flight from atomics into int.  They are now proctected by
> the bio_lock.  I fixed the race, by in finished_one_bio() by
> leaving the bio_count at 1 until after the dio_complete()
> and then do the bio_count decrement and wakeup holding the bio_lock.
> 
> Take a look, give it a try, and let me know what you think.

I had been trying a slightly different kind of fix -- appended is
the updated version of the patch I last posted. It uses the bio_list_lock
to protect the dio->waiter field, which finished_one_bio sets back
to NULL after it has issued the wakeup; and the code that waits for
i/o to drain out checks the dio->waiter field instead of bio_count.
This might not seem very obvious given the nomenclature of the 
bio_list_lock, so I was holding back wondering if it could be 
improved. 

Your approach looks clearer in that sense -- its pretty unambiguous
about what lock protects what fields. The only thing that bothers me (and
this is what I was trying to avoid in my patch) is the increased
use of spin_lock_irq 's (overhead of turning interrupts off and on)
instead of simple atomic inc/dec in most places.

Thoughts ?

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India

------------------------------------

Don't access dio fields if its possible that the dio could
already have been freed asynchronously during i/o completion.
The dio->bio_list_lock protects the dio->waiter field as in
the case of synchronous i/o.

--- pure-mm3/fs/direct-io.c	2003-11-24 13:00:33.000000000 +0530
+++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-25 14:08:26.000000000 +0530
@@ -231,8 +231,17 @@
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
 			} else {
-				if (dio->waiter)
-					wake_up_process(dio->waiter);
+				struct task_struct *waiter;
+				unsigned long flags;
+
+				spin_lock_irqsave(&dio->bio_list_lock, flags);
+				waiter = dio->waiter;
+				if (waiter) {
+					dio->waiter = NULL;
+					wake_up_process(waiter);
+				}
+				spin_unlock_irqrestore(&dio->bio_list_lock, 
+					flags);
 			}
 		}
 	}
@@ -994,26 +1004,35 @@
 	 * reflect the number of to-be-processed BIOs.
 	 */
 	if (dio->is_async) {
-		if (ret == 0)
-			ret = dio->result;
-		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
+		int should_wait = 0;
+
+		if (dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
+			should_wait = 1;
 		}
+		if (ret == 0)
+			ret = dio->result;
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (dio->waiter) {
+		if (should_wait) {
+			unsigned long flags;
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
 			 * before returning to fallback on buffered I/O
 			 */
+			spin_lock_irqsave(&dio->bio_list_lock, flags);
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			while (atomic_read(&dio->bio_count)) {
+			while (dio->waiter) {
+				spin_unlock_irqrestore(&dio->bio_list_lock, 
+				flags);
 				io_schedule();
 				set_current_state(TASK_UNINTERRUPTIBLE);
+				spin_lock_irqsave(&dio->bio_list_lock, flags);
 			}
 			set_current_state(TASK_RUNNING);
-			dio->waiter = NULL;
+			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			kfree(dio);
 		}
 	} else {
 		finished_one_bio(dio);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2003-11-26  7:49 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-13  7:30 2.6.0-test9-mm3 Andrew Morton
2003-11-13  7:30 ` 2.6.0-test9-mm3 Andrew Morton
2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz
2003-11-13 20:03   ` john stultz
2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil
2003-11-13 22:03   ` Daniel McNeil
2003-11-17  5:25   ` Suparna Bhattacharya
2003-11-17  5:25     ` Suparna Bhattacharya
2003-11-18  1:15     ` Daniel McNeil
2003-11-18  1:15       ` Daniel McNeil
2003-11-18  1:37       ` Daniel McNeil
2003-11-18  1:37         ` Daniel McNeil
2003-11-18 11:55         ` Suparna Bhattacharya
2003-11-18 11:55           ` Suparna Bhattacharya
2003-11-18 23:47           ` Daniel McNeil
2003-11-18 23:47             ` Daniel McNeil
2003-11-24  9:42             ` Suparna Bhattacharya
2003-11-24  9:42               ` Suparna Bhattacharya
2003-11-25 23:49               ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil
2003-11-26  7:55                 ` Suparna Bhattacharya [this message]
2003-11-26  7:55                   ` Suparna Bhattacharya
2003-12-02  1:35                   ` Daniel McNeil
2003-12-02  1:35                     ` Daniel McNeil
2003-12-02 15:25                     ` Suparna Bhattacharya
2003-12-02 15:25                       ` Suparna Bhattacharya
2003-12-03 23:14                       ` Daniel McNeil
2003-12-03 23:14                         ` Daniel McNeil
2003-12-04  4:40                         ` Suparna Bhattacharya
2003-12-04  4:40                           ` Suparna Bhattacharya
2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry
2003-11-14  5:07 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14  5:07   ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 20:57   ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 20:57     ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 21:57     ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 21:57       ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 21:37       ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 21:37         ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 21:47       ` 2.6.0-test9-mm3 Linus Torvalds
2003-11-14 21:47         ` 2.6.0-test9-mm3 Linus Torvalds
2003-11-15  0:55         ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-15  0:55           ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-15 19:34           ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo
2003-11-15 19:34             ` Zwane Mwaikambo
2003-11-15 19:52             ` Zwane Mwaikambo
2003-11-15 19:52               ` Zwane Mwaikambo
2003-11-17 21:46             ` Zwane Mwaikambo
2003-11-17 21:46               ` Zwane Mwaikambo
2003-11-17 22:42               ` Linus Torvalds
2003-11-17 22:42                 ` Linus Torvalds
2003-11-17 23:01                 ` Zwane Mwaikambo
2003-11-17 23:01                   ` Zwane Mwaikambo
2003-11-17 23:14                   ` Zwane Mwaikambo
2003-11-17 23:14                     ` Zwane Mwaikambo
2003-11-18  7:21                     ` Zwane Mwaikambo
2003-11-18  7:21                       ` Zwane Mwaikambo
2003-11-18 15:47                       ` Linus Torvalds
2003-11-18 15:47                         ` Linus Torvalds
2003-11-18 16:16                         ` Zwane Mwaikambo
2003-11-18 16:16                           ` Zwane Mwaikambo
2003-11-18 16:37                           ` Linus Torvalds
2003-11-18 16:37                             ` Linus Torvalds
2003-11-18 17:08                             ` Zwane Mwaikambo
2003-11-18 17:08                               ` Zwane Mwaikambo
2003-11-18 17:38                               ` Martin J. Bligh
2003-11-18 17:38                                 ` Martin J. Bligh
2003-11-18 17:22                                 ` Zwane Mwaikambo
2003-11-18 17:22                                   ` Zwane Mwaikambo
2003-11-19 20:32                             ` Matt Mackall
2003-11-19 20:32                               ` Matt Mackall
2003-11-19 23:09                               ` Matt Mackall
2003-11-19 23:09                                 ` Matt Mackall
2003-11-20  7:14                                 ` Zwane Mwaikambo
2003-11-20  7:14                                   ` Zwane Mwaikambo
2003-11-20  7:44                                 ` Matt Mackall
2003-11-20  7:44                                   ` Matt Mackall
2003-11-20  7:53                                   ` Andrew Morton
2003-11-20  7:53                                     ` Andrew Morton
2003-11-20  8:13                                   ` Matt Mackall
2003-11-20  8:13                                     ` Matt Mackall
2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 19:08   ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 18:59   ` 2.6.0-test9-mm3 Andrew Morton
2003-11-14 18:59     ` 2.6.0-test9-mm3 Andrew Morton
2003-11-14 19:32     ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-14 19:32       ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-14 20:27       ` 2.6.0-test9-mm3 John Stoffel
2003-11-14 20:27         ` 2.6.0-test9-mm3 John Stoffel
2003-11-15  1:01         ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-15  1:01           ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-14 19:10   ` 2.6.0-test9-mm3 Badari Pulavarty
2003-11-14 19:10     ` 2.6.0-test9-mm3 Badari Pulavarty
2003-11-14 20:29     ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 20:29       ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-17 20:58       ` 2.6.0-test9-mm3 bill davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031126075513.GA3902@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=daniel@osdl.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.