2.5.70-mm8: freeze after starting X

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.5.70-mm8: freeze after starting X
@ 2003-06-11 22:17 Bryan O'Sullivan
  2003-06-11 22:36 ` Robert Love
  2003-06-11 22:41 ` Andrew Morton
  0 siblings, 2 replies; 22+ messages in thread
From: Bryan O'Sullivan @ 2003-06-11 22:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

I just upgraded from -mm3 (which I'd been running solidly for over a
week) to -mm8, and find that the system freezes hard after I start the X
server.  After X starts, lifetime varies from zero to maybe 20 seconds
of app launching, then everything locks up.

At this point, the machine is still pingable, but daemons like sshd
don't respond, and I can't see any logs.  After a reboot back to -mm3,
there's nothing suspicious in /var/log.

	<b

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-11 22:17 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
@ 2003-06-11 22:36 ` Robert Love
  2003-06-11 22:41 ` Andrew Morton
  1 sibling, 0 replies; 22+ messages in thread
From: Robert Love @ 2003-06-11 22:36 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, linux-kernel

On Wed, 2003-06-11 at 15:17, Bryan O'Sullivan wrote:
> I just upgraded from -mm3 (which I'd been running solidly for over a
> week) to -mm8, and find that the system freezes hard after I start the X
> server.  After X starts, lifetime varies from zero to maybe 20 seconds
> of app launching, then everything locks up.
> 
> At this point, the machine is still pingable, but daemons like sshd
> don't respond, and I can't see any logs.  After a reboot back to -mm3,
> there's nothing suspicious in /var/log.

Same problem here. It started happening in -mm6.

I have not narrowed it down to anything suspicious, and because it is in
X and my normal desktop machine I have not really debugged the issue.

Interrupts are still on, though...

	Robert Love


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-11 22:17 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
  2003-06-11 22:36 ` Robert Love
@ 2003-06-11 22:41 ` Andrew Morton
  2003-06-11 22:53   ` Bryan O'Sullivan
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Andrew Morton @ 2003-06-11 22:41 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: linux-kernel

"Bryan O'Sullivan" <bos@serpentine.com> wrote:
>
> I just upgraded from -mm3 (which I'd been running solidly for over a
> week) to -mm8, and find that the system freezes hard after I start the X
> server.  After X starts, lifetime varies from zero to maybe 20 seconds
> of app launching, then everything locks up.

You might try reverting
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch

> At this point, the machine is still pingable, but daemons like sshd
> don't respond, and I can't see any logs.  After a reboot back to -mm3,
> there's nothing suspicious in /var/log.

Something oopsed I'd say.  You using radeon?  That seems pretty oopsy
lately.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-11 22:41 ` Andrew Morton
@ 2003-06-11 22:53   ` Bryan O'Sullivan
  2003-06-11 23:11   ` Bryan O'Sullivan
  2003-06-11 23:34   ` Robert Love
  2 siblings, 0 replies; 22+ messages in thread
From: Bryan O'Sullivan @ 2003-06-11 22:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Wed, 2003-06-11 at 15:41, Andrew Morton wrote:

> You might try reverting
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch

Will do.

> Something oopsed I'd say.  You using radeon?  That seems pretty oopsy
> lately.

Yep, R7500.

	<b


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-11 22:41 ` Andrew Morton
  2003-06-11 22:53   ` Bryan O'Sullivan
@ 2003-06-11 23:11   ` Bryan O'Sullivan
  2003-06-11 23:34   ` Robert Love
  2 siblings, 0 replies; 22+ messages in thread
From: Bryan O'Sullivan @ 2003-06-11 23:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Wed, 2003-06-11 at 15:41, Andrew Morton wrote:

> You might try reverting
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch

No good.

	<b


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-11 22:41 ` Andrew Morton
  2003-06-11 22:53   ` Bryan O'Sullivan
  2003-06-11 23:11   ` Bryan O'Sullivan
@ 2003-06-11 23:34   ` Robert Love
  2003-06-12  0:18     ` Robert Love
  2 siblings, 1 reply; 22+ messages in thread
From: Robert Love @ 2003-06-11 23:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bryan O'Sullivan, linux-kernel

On Wed, 2003-06-11 at 15:41, Andrew Morton wrote:

> You might try reverting
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch
>
> Something oopsed I'd say.  You using radeon?  That seems pretty oopsy
> lately.

I will debunk both theories: its not Radeon (I have a Matrox) and its
not the pci-init-ordering-fix patch (I already tried that).

	Robert Love


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-11 23:34   ` Robert Love
@ 2003-06-12  0:18     ` Robert Love
  2003-06-12  0:24       ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Robert Love @ 2003-06-12  0:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bryan O'Sullivan, linux-kernel, piggin

On Wed, 2003-06-11 at 16:34, Robert Love wrote:

> I will debunk both theories: its not Radeon (I have a Matrox) and its
> not the pci-init-ordering-fix patch (I already tried that).

Ah, it is the anticipatory I/O scheduler.

There is a logic thinko somewhere... I have not found it yet, but I have
narrowed it down to something which the attached patch fixes (i.e.,
apply this patch and the problem is fixed).

Maybe Nick can see the bug and short circuit my search? The problem is
related to the as-autotune-write-batches patch.

	Robert Love


Fix system deadlock in 2.5.70-mm6 and beyond.

 drivers/block/as-iosched.c |   68 +++++++--------------------------------------
 1 files changed, 11 insertions(+), 57 deletions(-)


diff -urN linux-2.5.70-mm8/drivers/block/as-iosched.c linux/drivers/block/as-iosched.c
--- linux-2.5.70-mm8/drivers/block/as-iosched.c	2003-06-11 17:12:02.919110360 -0700
+++ linux/drivers/block/as-iosched.c	2003-06-11 17:05:59.000000000 -0700
@@ -52,7 +52,7 @@
  * See, the problem is: we can send a lot of writes to disk cache / TCQ in
  * a short amount of time...
  */
-#define default_write_batch_expire (HZ / 20)
+#define default_write_batch_expire (5)
 
 /*
  * max time we may wait to anticipate a read (default around 6ms)
@@ -135,9 +135,6 @@
 	unsigned long last_check_fifo[2];
 	int changed_batch;
 	int batch_data_dir;		/* current batch REQ_SYNC / REQ_ASYNC */
-	int write_batch_count;		/* max # of reqs in a write batch */
-	int current_write_count;	/* how many requests left this batch */
-	int write_batch_idled;		/* has the write batch gone idle? */
 	mempool_t *arq_pool;
 
 	enum anticipation_status antic_status;
@@ -938,35 +935,6 @@
 }
 
 /*
- * Gathers timings and resizes the write batch automatically
- */
-void update_write_batch(struct as_data *ad)
-{
-	unsigned long batch = ad->batch_expire[REQ_ASYNC];
-	long write_time;
-
-	write_time = (jiffies - ad->current_batch_expires) + batch;
-	if (write_time < 0)
-		write_time = 0;
-
-	if (write_time > batch + 5 && !ad->write_batch_idled) {
-		if (write_time / batch > 2)
-			ad->write_batch_count /= 2;
-		else
-			ad->write_batch_count--;
-		
-	} else if (write_time + 5 < batch && ad->current_write_count == 0) {
-		if (batch / write_time > 2)
-			ad->write_batch_count *= 2;
-		else
-			ad->write_batch_count++;
-	}
-
-	if (ad->write_batch_count < 1)
-		ad->write_batch_count = 1;
-}
-
-/*
  * as_completed_request is to be called when a request has completed and
  * returned something to the requesting process, be it an error or data.
  */
@@ -981,7 +949,8 @@
 		return;
 	}
 
-	WARN_ON(blk_fs_request(rq) && arq->state == AS_RQ_NEW);
+	if (blk_fs_request(rq) && arq->state == AS_RQ_NEW)
+		printk(KERN_INFO "warning: as_completed_request got bad request\n");
 				
 	if (arq->state != AS_RQ_DISPATCHED)
 		return;
@@ -999,7 +968,6 @@
 	 */
 	if (ad->batch_data_dir == REQ_SYNC && ad->changed_batch
 			&& ad->batch_data_dir == arq->is_sync) {
-		update_write_batch(ad);
 		ad->current_batch_expires = jiffies +
 				ad->batch_expire[REQ_SYNC];
 		ad->changed_batch = 0;
@@ -1151,11 +1119,10 @@
 		return 0;
 
 	if (ad->batch_data_dir == REQ_SYNC)
-		/* TODO! add a check so a complete fifo gets written? */
-		return time_after(jiffies, ad->current_batch_expires);
+		return time_after(jiffies, ad->current_batch_expires) &&
+		 	time_after(jiffies, ad->fifo_expire[REQ_SYNC]);
 
-	return time_after(jiffies, ad->current_batch_expires)
-		|| ad->current_write_count == 0;
+	return !ad->current_batch_expires;
 }
 
 /*
@@ -1187,9 +1154,8 @@
 			put_as_io_context(&ad->as_io_context);
 			ad->as_io_context = NULL;
 		}
-
-		if (ad->current_write_count != 0)
-			ad->current_write_count--;
+		if (ad->current_batch_expires)
+			ad->current_batch_expires--;
 	}
 	ad->aic_finished = 0;
 
@@ -1218,12 +1184,6 @@
 	const int reads = !list_empty(&ad->fifo_list[REQ_SYNC]);
 	const int writes = !list_empty(&ad->fifo_list[REQ_ASYNC]);
 
-	/* Signal that the write batch was uncontended, so we can't time it */
-	if (ad->batch_data_dir == REQ_ASYNC && !reads) {
-		if (ad->current_write_count == 0 || !writes)
-			ad->write_batch_idled = 1;
-	}
-	
 	if (!(reads || writes)
 		|| ad->antic_status == ANTIC_WAIT_REQ
 		|| ad->antic_status == ANTIC_WAIT_NEXT
@@ -1288,8 +1248,7 @@
  		if (ad->batch_data_dir == REQ_SYNC)
  			ad->changed_batch = 1;
 		ad->batch_data_dir = REQ_ASYNC;
-		ad->current_write_count = ad->write_batch_count;
-		ad->write_batch_idled = 0;
+		ad->current_batch_expires = ad->batch_expire[REQ_ASYNC];
 		arq = ad->next_arq[ad->batch_data_dir];
 		goto dispatch_request;
 	}
@@ -1311,11 +1270,9 @@
 	if (ad->changed_batch) {
 		if (ad->changed_batch == 1 && ad->nr_dispatched)
 			return 0;
-		if (ad->batch_data_dir == REQ_ASYNC) {
-			ad->current_batch_expires = jiffies +
-					ad->batch_expire[REQ_ASYNC];
+		if (ad->changed_batch == 1 && ad->batch_data_dir == REQ_ASYNC)
 			ad->changed_batch = 0;
-		} else
+		else
 			ad->changed_batch = 2;
 		arq->request->flags |= REQ_HARDBARRIER;
 	}
@@ -1727,9 +1684,6 @@
 	e->elevator_data = ad;
 
 	ad->current_batch_expires = jiffies + ad->batch_expire[REQ_SYNC];
-	ad->write_batch_count = ad->batch_expire[REQ_ASYNC] / 10;
-	if (ad->write_batch_count < 2)
-		ad->write_batch_count = 2;
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-12  0:18     ` Robert Love
@ 2003-06-12  0:24       ` Andrew Morton
  2003-06-12  1:10         ` [patch] as-iosched divide by zero fix Robert Love
  2003-06-12 17:19         ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2003-06-12  0:24 UTC (permalink / raw)
  To: Robert Love; +Cc: bos, linux-kernel, piggin

Robert Love <rml@tech9.net> wrote:
>
> On Wed, 2003-06-11 at 16:34, Robert Love wrote:
> 
> > I will debunk both theories: its not Radeon (I have a Matrox) and its
> > not the pci-init-ordering-fix patch (I already tried that).
> 
> Ah, it is the anticipatory I/O scheduler.
> 
> There is a logic thinko somewhere... I have not found it yet, but I have
> narrowed it down to something which the attached patch fixes (i.e.,
> apply this patch and the problem is fixed).
> 
> Maybe Nick can see the bug and short circuit my search? The problem is
> related to the as-autotune-write-batches patch.

Do you know what the actual oops is?

Odd that starting the X server triggers it.  Be interesting if your patch
fixes things for Brian.

There appear to be several divide-by-zero possibilities in there.  A random
patch would be:



diff -puN drivers/block/as-iosched.c~a drivers/block/as-iosched.c
--- 25/drivers/block/as-iosched.c~a	Wed Jun 11 17:23:42 2003
+++ 25-akpm/drivers/block/as-iosched.c	Wed Jun 11 17:23:42 2003
@@ -950,13 +950,13 @@ void update_write_batch(struct as_data *
 		write_time = 0;
 
 	if (write_time > batch + 5 && !ad->write_batch_idled) {
-		if (write_time / batch > 2)
+		if (batch && (write_time / batch > 2))
 			ad->write_batch_count /= 2;
 		else
 			ad->write_batch_count--;
 		
 	} else if (write_time + 5 < batch && ad->current_write_count == 0) {
-		if (batch / write_time > 2)
+		if (write_time && (batch / write_time > 2))
 			ad->write_batch_count *= 2;
 		else
 			ad->write_batch_count++;

_


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch] as-iosched divide by zero fix
  2003-06-12  0:24       ` Andrew Morton
@ 2003-06-12  1:10         ` Robert Love
  2003-06-12  1:22           ` Andrew Morton
                             ` (2 more replies)
  2003-06-12 17:19         ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
  1 sibling, 3 replies; 22+ messages in thread
From: Robert Love @ 2003-06-12  1:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bos, linux-kernel, piggin

On Wed, 2003-06-11 at 17:24, Andrew Morton wrote:

> Do you know what the actual oops is?

I got it all figured out now.

It is a divide by zero in update_write_batch() called from
as_completed_request().

> Odd that starting the X server triggers it.  Be interesting if your patch
> fixes things for Brian.

I reproduced it without X.

The divide by zero is on line 959 with the divide by 'write_time'. It
can obviously be zero (see line 950). The divide by 'batch' on line 953
seems safe.

The correct patch is below.

Most important question: why are only some of us seeing this?

	Robert Love


Fix as-iosched divide-by-zero bug.

 drivers/block/as-iosched.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


diff -urN linux-2.5.70-mm8/drivers/block/as-iosched.c linux/drivers/block/as-iosched.c
--- linux-2.5.70-mm8/drivers/block/as-iosched.c	2003-06-11 17:12:02.000000000 -0700
+++ linux/drivers/block/as-iosched.c	2003-06-11 18:04:15.222619392 -0700
@@ -954,9 +954,9 @@
 			ad->write_batch_count /= 2;
 		else
 			ad->write_batch_count--;
-		
+
 	} else if (write_time + 5 < batch && ad->current_write_count == 0) {
-		if (batch / write_time > 2)
+		if (write_time && (batch / write_time > 2))
 			ad->write_batch_count *= 2;
 		else
 			ad->write_batch_count++;




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:10         ` [patch] as-iosched divide by zero fix Robert Love
@ 2003-06-12  1:22           ` Andrew Morton
  2003-06-12  1:28             ` Robert Love
  2003-06-12  1:31             ` Nick Piggin
  2003-06-12  1:22           ` Nick Piggin
  2003-06-12  1:54           ` Steven Cole
  2 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2003-06-12  1:22 UTC (permalink / raw)
  To: Robert Love; +Cc: bos, linux-kernel, piggin

Robert Love <rml@tech9.net> wrote:
>
> Fix as-iosched divide-by-zero bug.

hrm, OK.  Still not convinced about `batch'.

How about this?

--- 25/drivers/block/as-iosched.c~as-div-by-zero-fix	2003-06-11 18:17:04.000000000 -0700
+++ 25-akpm/drivers/block/as-iosched.c	2003-06-11 18:20:58.000000000 -0700
@@ -930,13 +930,12 @@ void update_write_batch(struct as_data *
 		write_time = 0;
 
 	if (write_time > batch + 5 && !ad->write_batch_idled) {
-		if (write_time / batch > 2)
+		if (write_time > batch * 2)
 			ad->write_batch_count /= 2;
 		else
 			ad->write_batch_count--;
-		
 	} else if (write_time + 5 < batch && ad->current_write_count == 0) {
-		if (batch / write_time > 2)
+		if (batch > write_time * 2)
 			ad->write_batch_count *= 2;
 		else
 			ad->write_batch_count++;

_


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:22           ` Andrew Morton
@ 2003-06-12  1:28             ` Robert Love
  2003-06-12  1:41               ` Nick Piggin
  2003-06-12  1:31             ` Nick Piggin
  1 sibling, 1 reply; 22+ messages in thread
From: Robert Love @ 2003-06-12  1:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bos, linux-kernel, piggin

On Wed, 2003-06-11 at 18:22, Andrew Morton wrote:

> hrm, OK.  Still not convinced about `batch'.

batch is only zero if {read,write}_batch_expire are zero. Nick, is that
legal/desirable? Or should we prevent that in the sysfs interface?

> -		if (write_time / batch > 2)
> +		if (write_time > batch * 2)
>
> -		if (batch / write_time > 2)
> +		if (batch > write_time * 2)

Much better! :)

	Robert Love


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:28             ` Robert Love
@ 2003-06-12  1:41               ` Nick Piggin
  0 siblings, 0 replies; 22+ messages in thread
From: Nick Piggin @ 2003-06-12  1:41 UTC (permalink / raw)
  To: Robert Love; +Cc: Andrew Morton, bos, linux-kernel



Robert Love wrote:

>On Wed, 2003-06-11 at 18:22, Andrew Morton wrote:
>
>
>>hrm, OK.  Still not convinced about `batch'.
>>
>
>batch is only zero if {read,write}_batch_expire are zero. Nick, is that
>legal/desirable? Or should we prevent that in the sysfs interface?
>

Yeah it shouldn't really be allowed, but I think everything
will keep working (with this change), even writes.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:22           ` Andrew Morton
  2003-06-12  1:28             ` Robert Love
@ 2003-06-12  1:31             ` Nick Piggin
  2003-06-12  2:34               ` John Stoffel
  1 sibling, 1 reply; 22+ messages in thread
From: Nick Piggin @ 2003-06-12  1:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robert Love, bos, linux-kernel



Andrew Morton wrote:

>Robert Love <rml@tech9.net> wrote:
>
>>Fix as-iosched divide-by-zero bug.
>>
>
>hrm, OK.  Still not convinced about `batch'.
>
>How about this?
>

Yeah, thats the way to do it, of course. It was too
jumpy at that setting though, so make it batch*3
(or <<1+batch if you don't want the multiply).

>
>--- 25/drivers/block/as-iosched.c~as-div-by-zero-fix	2003-06-11 18:17:04.000000000 -0700
>+++ 25-akpm/drivers/block/as-iosched.c	2003-06-11 18:20:58.000000000 -0700
>@@ -930,13 +930,12 @@ void update_write_batch(struct as_data *
> 		write_time = 0;
> 
> 	if (write_time > batch + 5 && !ad->write_batch_idled) {
>-		if (write_time / batch > 2)
>+		if (write_time > batch * 2)
> 			ad->write_batch_count /= 2;
> 		else
> 			ad->write_batch_count--;
>-		
> 	} else if (write_time + 5 < batch && ad->current_write_count == 0) {
>-		if (batch / write_time > 2)
>+		if (batch > write_time * 2)
> 			ad->write_batch_count *= 2;
> 		else
> 			ad->write_batch_count++;
>
>_
>
>
>  
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:31             ` Nick Piggin
@ 2003-06-12  2:34               ` John Stoffel
  2003-06-12  4:05                 ` Nick Piggin
  0 siblings, 1 reply; 22+ messages in thread
From: John Stoffel @ 2003-06-12  2:34 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Robert Love, bos, linux-kernel

Nick> Yeah, thats the way to do it, of course. It was too jumpy at
Nick> that setting though, so make it batch*3 (or <<1+batch if you
Nick> don't want the multiply).

Aren't we trying to get away from magic constants like this?  Or at
least a better idea of why batch*3 is better than batch*2?  I will
admit I haven't had the chance to peer into the code, so I'm probably
just being stupid (and lazy) here to speak up.

I guess the real question I have is what happens if we make it
batch*100, how does the affect the algorithm?  And if going from 2 to
3 makes such a difference, doesn't that point to a scaling issue,
i.e. we should have 200 and 300 here, so we can try out 250 as an
intermediate value.

*shrug* Just trying to understand...

John

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  2:34               ` John Stoffel
@ 2003-06-12  4:05                 ` Nick Piggin
  0 siblings, 0 replies; 22+ messages in thread
From: Nick Piggin @ 2003-06-12  4:05 UTC (permalink / raw)
  To: John Stoffel; +Cc: Andrew Morton, Robert Love, bos, linux-kernel

John Stoffel wrote:

>Nick> Yeah, thats the way to do it, of course. It was too jumpy at
>Nick> that setting though, so make it batch*3 (or <<1+batch if you
>Nick> don't want the multiply).
>
>Aren't we trying to get away from magic constants like this?  Or at
>least a better idea of why batch*3 is better than batch*2?  I will
>admit I haven't had the chance to peer into the code, so I'm probably
>just being stupid (and lazy) here to speak up.
>
>I guess the real question I have is what happens if we make it
>batch*100, how does the affect the algorithm?  And if going from 2 to
>3 makes such a difference, doesn't that point to a scaling issue,
>i.e. we should have 200 and 300 here, so we can try out 250 as an
>intermediate value.
>
>*shrug* Just trying to understand...
>

Its not very critical. It is used to estimate how many
requests will take t ms to complete, based on how many
requests we sent last time around, and how long that
took.

As long as it increment when we were below estimate and
decrement when above it should be OK. The mul/div were
put there in order to adapt more quickly to changes in
the request pattern.

You would think that if a batch of 10 requests took
twice as long as we wanted, we might as well submit 5
requests, but in testing I found that this makes the
numbers jump around too much, while a 3* threshold
smoothed it over while still adapting to changes nicely.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:10         ` [patch] as-iosched divide by zero fix Robert Love
  2003-06-12  1:22           ` Andrew Morton
@ 2003-06-12  1:22           ` Nick Piggin
  2003-06-12  1:29             ` Robert Love
  2003-06-12  1:54           ` Steven Cole
  2 siblings, 1 reply; 22+ messages in thread
From: Nick Piggin @ 2003-06-12  1:22 UTC (permalink / raw)
  To: Robert Love; +Cc: Andrew Morton, bos, linux-kernel



Robert Love wrote:

>On Wed, 2003-06-11 at 17:24, Andrew Morton wrote:
>
>
>>Do you know what the actual oops is?
>>
>
>I got it all figured out now.
>
>It is a divide by zero in update_write_batch() called from
>as_completed_request().
>
>
>>Odd that starting the X server triggers it.  Be interesting if your patch
>>fixes things for Brian.
>>
>
>I reproduced it without X.
>
>The divide by zero is on line 959 with the divide by 'write_time'. It
>can obviously be zero (see line 950). The divide by 'batch' on line 953
>seems safe.
>
>The correct patch is below.
>

Probably put in the other check to be on the safe side.
And can the check be if (!write_time || (batch / write_time > 2)

>
>
>Most important question: why are only some of us seeing this?
>

It would occur if a write batch didn't take any jiffies, which
isn't very likely. The HZ=100 change probbly brought it out.
Thanks guys.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:22           ` Nick Piggin
@ 2003-06-12  1:29             ` Robert Love
  0 siblings, 0 replies; 22+ messages in thread
From: Robert Love @ 2003-06-12  1:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, bos, linux-kernel

On Wed, 2003-06-11 at 18:22, Nick Piggin wrote:

> It would occur if a write batch didn't take any jiffies, which
> isn't very likely. The HZ=100 change probbly brought it out.

I reverted that change (yah yah, I am no help). I have seen the problem
since mm6, but just today got around to looking into it.

Anyhow, its fixed.

	Robert Love


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:10         ` [patch] as-iosched divide by zero fix Robert Love
  2003-06-12  1:22           ` Andrew Morton
  2003-06-12  1:22           ` Nick Piggin
@ 2003-06-12  1:54           ` Steven Cole
  2003-06-12  2:01             ` Robert Love
  2 siblings, 1 reply; 22+ messages in thread
From: Steven Cole @ 2003-06-12  1:54 UTC (permalink / raw)
  To: Robert Love; +Cc: Andrew Morton, bos, linux-kernel, piggin

On Wed, 2003-06-11 at 19:10, Robert Love wrote:
> On Wed, 2003-06-11 at 17:24, Andrew Morton wrote:
> 
> > Do you know what the actual oops is?
> 
> I got it all figured out now.
> 
> It is a divide by zero in update_write_batch() called from
> as_completed_request().
> 
> > Odd that starting the X server triggers it.  Be interesting if your patch
> > fixes things for Brian.
> 
> I reproduced it without X.
> 
> The divide by zero is on line 959 with the divide by 'write_time'. It
> can obviously be zero (see line 950). The divide by 'batch' on line 953
> seems safe.
> 
> The correct patch is below.
> 
> Most important question: why are only some of us seeing this?
> 
> 	Robert Love

With regards to the last, here is an anti-AOL! for the oops.  I ran
2.5.70-mm8 for several hours today, doing kernel compiles and running
dbench 64 on ext3, xfs, and jfs.  No oops.  

All while running X (although that now seems moot).  Base distro is RH9
if that could matter.  System is UP (PIII), PREEMPT, IDE, i810 chipset.

Steven


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch] as-iosched divide by zero fix
  2003-06-12  1:54           ` Steven Cole
@ 2003-06-12  2:01             ` Robert Love
  0 siblings, 0 replies; 22+ messages in thread
From: Robert Love @ 2003-06-12  2:01 UTC (permalink / raw)
  To: Steven Cole; +Cc: Andrew Morton, bos, linux-kernel, piggin

On Wed, 2003-06-11 at 18:54, Steven Cole wrote:

> With regards to the last, here is an anti-AOL! for the oops.  I ran
> 2.5.70-mm8 for several hours today, doing kernel compiles and running
> dbench 64 on ext3, xfs, and jfs.  No oops.  
> 
> All while running X (although that now seems moot).  Base distro is RH9
> if that could matter.  System is UP (PIII), PREEMPT, IDE, i810 chipset.

Right. Most people are not seeing this.

I have a system very similar to yours, interestingly. It is just random
timings I guess.

	Robert Love


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-12  0:24       ` Andrew Morton
  2003-06-12  1:10         ` [patch] as-iosched divide by zero fix Robert Love
@ 2003-06-12 17:19         ` Bryan O'Sullivan
  2003-06-12 17:29           ` Bryan O'Sullivan
  2003-06-12 17:30           ` Davide Libenzi
  1 sibling, 2 replies; 22+ messages in thread
From: Bryan O'Sullivan @ 2003-06-12 17:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robert Love, linux-kernel, piggin

On Wed, 2003-06-11 at 17:24, Andrew Morton wrote:

> Odd that starting the X server triggers it.  Be interesting if your patch
> fixes things for Brian.

I think Robert and I are seeing different things.  For me, -mm6 is fine
(unlike Robert's case), -mm7 oopses in the PCI init code during early
boot (somewhere in the radeon init stuff, can't capture the oops
easily), and -mm8 gives itself a wedgie a few seconds after starting X.

I'm about to try, um, whichever of the umpty-ump patches that went back
and forth looks most plausible.

	<b

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-12 17:19         ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
@ 2003-06-12 17:29           ` Bryan O'Sullivan
  2003-06-12 17:30           ` Davide Libenzi
  1 sibling, 0 replies; 22+ messages in thread
From: Bryan O'Sullivan @ 2003-06-12 17:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robert Love, linux-kernel, piggin

On Thu, 2003-06-12 at 10:19, Bryan O'Sullivan wrote:

> I'm about to try, um, whichever of the umpty-ump patches that went back
> and forth looks most plausible.

Tried your two-liner, Andrew, but -mm8 is as freezy as ever.  I'll see
if I can hook up a serial console and find an oops at some point.

	<b


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.5.70-mm8: freeze after starting X
  2003-06-12 17:19         ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
  2003-06-12 17:29           ` Bryan O'Sullivan
@ 2003-06-12 17:30           ` Davide Libenzi
  1 sibling, 0 replies; 22+ messages in thread
From: Davide Libenzi @ 2003-06-12 17:30 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Andrew Morton, Robert Love, Linux Kernel Mailing List, piggin

On Thu, 12 Jun 2003, Bryan O'Sullivan wrote:

> On Wed, 2003-06-11 at 17:24, Andrew Morton wrote:
>
> > Odd that starting the X server triggers it.  Be interesting if your patch
> > fixes things for Brian.
>
> I think Robert and I are seeing different things.  For me, -mm6 is fine
> (unlike Robert's case), -mm7 oopses in the PCI init code during early
> boot (somewhere in the radeon init stuff, can't capture the oops
> easily), and -mm8 gives itself a wedgie a few seconds after starting X.
>
> I'm about to try, um, whichever of the umpty-ump patches that went back
> and forth looks most plausible.

I'm having total freezes with 2.5.69 in both my home laptop with a SiS650
chipset and in my machine at work with Intel Corp. 82845G/GL. Using X with
Gnome (RH9) the system will end up to a completely frozen state after a
random amount of time. This happens with almost no activity on the machine
that makes me thing to be not related to some kind of load. IRQ are
disabled and the IDE drive light remains on. I planned to debug the thing
but I didn't have time yet. I set up the NMI oopser and I need to do
something to get the dump since when the NMI trigger I'm in graphic mode.
I was thinking about LKCD. It has never happened in console mode so it
must be X/Gnome+2.5.69

- Davide

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2003-06-12 17:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-11 22:17 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
2003-06-11 22:36 ` Robert Love
2003-06-11 22:41 ` Andrew Morton
2003-06-11 22:53   ` Bryan O'Sullivan
2003-06-11 23:11   ` Bryan O'Sullivan
2003-06-11 23:34   ` Robert Love
2003-06-12  0:18     ` Robert Love
2003-06-12  0:24       ` Andrew Morton
2003-06-12  1:10         ` [patch] as-iosched divide by zero fix Robert Love
2003-06-12  1:22           ` Andrew Morton
2003-06-12  1:28             ` Robert Love
2003-06-12  1:41               ` Nick Piggin
2003-06-12  1:31             ` Nick Piggin
2003-06-12  2:34               ` John Stoffel
2003-06-12  4:05                 ` Nick Piggin
2003-06-12  1:22           ` Nick Piggin
2003-06-12  1:29             ` Robert Love
2003-06-12  1:54           ` Steven Cole
2003-06-12  2:01             ` Robert Love
2003-06-12 17:19         ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan
2003-06-12 17:29           ` Bryan O'Sullivan
2003-06-12 17:30           ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox