public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] sys_sync livelock fix
@ 2002-02-12 23:13 Andrew Morton
  2002-02-12 23:31 ` Alan Cox
  2002-02-13  1:36 ` What is a livelock? (was: [patch] sys_sync livelock fix) Olaf Dietsche
  0 siblings, 2 replies; 48+ messages in thread
From: Andrew Morton @ 2002-02-12 23:13 UTC (permalink / raw)
  To: Alan Cox, lkml

The get_request fairness patch exposed a livelock
in the buffer layer.  write_unlocked_buffers() will
not terminate while other tasks are generating write traffic.

The patch simply bales out after writing all the buffers which
were dirty at the time the function was called, rather than
keeping on trying to write buffers until the list is empty.

Given that /bin/sync calls write_unlocked_buffers() three times,
that's good enough.  sync still takes aaaaaages, but it terminates.




--- linux-2.4.18-pre9-ac2/fs/buffer.c	Tue Feb 12 12:26:41 2002
+++ ac24/fs/buffer.c	Tue Feb 12 14:39:39 2002
@@ -189,12 +189,13 @@ static void write_locked_buffers(struct 
  * return without it!
  */
 #define NRSYNC (32)
-static int write_some_buffers(kdev_t dev)
+static int write_some_buffers(kdev_t dev, signed long *nr_to_write)
 {
 	struct buffer_head *next;
 	struct buffer_head *array[NRSYNC];
 	unsigned int count;
 	int nr;
+	int ret;
 
 	next = lru_list[BUF_DIRTY];
 	nr = nr_buffers_type[BUF_DIRTY];
@@ -213,29 +214,38 @@ static int write_some_buffers(kdev_t dev
 			array[count++] = bh;
 			if (count < NRSYNC)
 				continue;
-
-			spin_unlock(&lru_list_lock);
-			write_locked_buffers(array, count);
-			return -EAGAIN;
+			ret = -EAGAIN;
+			goto writeout;
 		}
 		unlock_buffer(bh);
 		__refile_buffer(bh);
 	}
+	ret = 0;
+writeout:
 	spin_unlock(&lru_list_lock);
-
-	if (count)
+	if (count) {
 		write_locked_buffers(array, count);
-	return 0;
+		if (nr_to_write)
+			*nr_to_write -= count;
+	}
+	return ret;
 }
 
 /*
- * Write out all buffers on the dirty list.
+ * Because we drop the locking during I/O it's not possible
+ * to write out all the buffers.  So the only guarantee that
+ * we can make here is that we write out all the buffers which
+ * were dirty at the time write_unlocked_buffers() was called.
+ * fsync_dev() calls in here three times, so we end up writing
+ * many more buffers than ever appear on BUF_DIRTY.
  */
 static void write_unlocked_buffers(kdev_t dev)
 {
+	signed long nr_to_write = nr_buffers_type[BUF_DIRTY] * 2;
+
 	do {
 		spin_lock(&lru_list_lock);
-	} while (write_some_buffers(dev));
+	} while (write_some_buffers(dev, &nr_to_write) && (nr_to_write > 0));
 	run_task_queue(&tq_disk);
 }
 
@@ -1085,7 +1095,7 @@ void balance_dirty(void)
 	 */
 	if (state > 0) {
 		spin_lock(&lru_list_lock);
-		write_some_buffers(NODEV);
+		write_some_buffers(NODEV, NULL);
 		wait_for_some_buffers(NODEV);
 	}
 }
@@ -2846,7 +2856,7 @@ static int sync_old_buffers(void)
 		bh = lru_list[BUF_DIRTY];
 		if (!bh || time_before(jiffies, bh->b_flushtime))
 			break;
-		if (write_some_buffers(NODEV))
+		if (write_some_buffers(NODEV, NULL))
 			continue;
 		return 0;
 	}
@@ -2945,7 +2955,7 @@ int bdflush(void *startup)
 		CHECK_EMERGENCY_SYNC
 
 		spin_lock(&lru_list_lock);
-		if (!write_some_buffers(NODEV) || balance_dirty_state() < 0) {
+		if (!write_some_buffers(NODEV, NULL) || balance_dirty_state() < 0) {
 			wait_for_some_buffers(NODEV);
 			interruptible_sleep_on(&bdflush_wait);
 		}


-

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [patch] sys_sync livelock fix
@ 2002-02-13  9:18 Andries.Brouwer
  0 siblings, 0 replies; 48+ messages in thread
From: Andries.Brouwer @ 2002-02-13  9:18 UTC (permalink / raw)
  To: davidsen, jgarzik; +Cc: akpm, alan, linux-kernel, viro

> Yow, your message inspired me to re-read SuSv2 and indeed confirm,

As a side note, these days you should be reading SuSv3,
it is an official standard now. See, for example,

http://www.UNIX-systems.org/version3/
http://www.opengroup.org/onlinepubs/007904975/toc.htm

> sync(2) schedules I/O but can return before completion

Don't forget that this standard does not describe what is
desirable, but describes the minimum guaranteed by all
Unices considered.

Having a sync that returns without having written the data
is not especially useful. Also without the sync this data
would have been written sooner or later.
We changed sync to wait, long ago, because otherwise shutdown
would cause filesystem corruption.

Andries

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [patch] sys_sync livelock fix
@ 2002-02-14  0:57 Andries.Brouwer
  0 siblings, 0 replies; 48+ messages in thread
From: Andries.Brouwer @ 2002-02-14  0:57 UTC (permalink / raw)
  To: davidsen, phillips; +Cc: akpm, alan, jgarzik, linux-kernel, viro

> it should work in the _best_ way, and if the standard got it wrong
> then the standard has to change.

: BTW: I think users would expect the system call to work as the standard
: specifies, not some better way which would break on non-Linux systems. Of
: course now working programs which conform to the standard DO break on
: Linux.

Let me repeat:
The standard describes a *minimum*.
A system that does not give more than this minimum would be
a very poor system indeed.

That POSIX does not require more than 14 bytes in a filename
and does not promise me more than 6 simultaneous processes
does not prevent us from having something better.

In this particular case (sync) the minimum required is
essentially empty. The proposed semantics: make sure that
before return all writes that were scheduled at the time
of the call seems entirely satisfactory.

Andries


(BTW Is your df broken? It is very long ago that my df did
a sync.)

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2002-02-18 22:19 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-12 23:13 [patch] sys_sync livelock fix Andrew Morton
2002-02-12 23:31 ` Alan Cox
2002-02-12 23:22   ` Andrew Morton
2002-02-13  0:28     ` Alan Cox
2002-02-13  3:28       ` Bill Davidsen
2002-02-13  3:46         ` Jeff Garzik
2002-02-13 15:11           ` Daniel Phillips
2002-02-13 22:24             ` Bill Davidsen
2002-02-13 22:41               ` Mike Fedyk
2002-02-14  0:26               ` Daniel Phillips
2002-02-14  0:37                 ` Andrew Morton
2002-02-14  0:49                   ` Daniel Phillips
2002-02-14  0:53                     ` Andrew Morton
2002-02-14  1:27                       ` Daniel Phillips
2002-02-14  1:29                         ` Andrew Morton
2002-02-14  1:59                     ` Mike Fedyk
2002-02-14  2:07                       ` Daniel Phillips
2002-02-13 23:31             ` Rob Landley
2002-02-14  0:44               ` Daniel Phillips
2002-02-12 23:29   ` Rik van Riel
2002-02-13  0:25     ` Alan Cox
2002-02-13  0:15       ` Rik van Riel
2002-02-13  0:36         ` Alan Cox
2002-02-13  0:36           ` Rik van Riel
2002-02-13  0:39           ` Andrew Morton
2002-02-13  3:42             ` Bill Davidsen
2002-02-13  3:54             ` Bill Davidsen
2002-02-13  4:01               ` Jeff Garzik
2002-02-13  4:53                 ` Bill Davidsen
2002-02-13 15:17                 ` Daniel Phillips
2002-02-13  4:29               ` Andrew Morton
2002-02-13  5:21                 ` Bill Davidsen
2002-02-13  5:35                   ` Andrew Morton
2002-02-18  2:29                     ` Bill Davidsen
2002-02-13 14:09                   ` bill davidsen
2002-02-13 15:29                   ` Daniel Phillips
2002-02-13 22:53                     ` Bill Davidsen
2002-02-14  0:33                       ` Daniel Phillips
2002-02-13  1:36 ` What is a livelock? (was: [patch] sys_sync livelock fix) Olaf Dietsche
2002-02-13  1:56   ` Andrew Morton
2002-02-13  2:30     ` Olaf Dietsche
2002-02-13  2:39       ` Andrew Morton
2002-02-13 16:19         ` Olaf Dietsche
2002-02-13  2:52       ` William Lee Irwin III
2002-02-18 22:19       ` David Schwartz
2002-02-13  2:33   ` Rob Landley
  -- strict thread matches above, loose matches on Subject: below --
2002-02-13  9:18 [patch] sys_sync livelock fix Andries.Brouwer
2002-02-14  0:57 Andries.Brouwer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox