* 2.5.70-mm8: freeze after starting X @ 2003-06-11 22:17 Bryan O'Sullivan 2003-06-11 22:36 ` Robert Love 2003-06-11 22:41 ` Andrew Morton 0 siblings, 2 replies; 22+ messages in thread From: Bryan O'Sullivan @ 2003-06-11 22:17 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel I just upgraded from -mm3 (which I'd been running solidly for over a week) to -mm8, and find that the system freezes hard after I start the X server. After X starts, lifetime varies from zero to maybe 20 seconds of app launching, then everything locks up. At this point, the machine is still pingable, but daemons like sshd don't respond, and I can't see any logs. After a reboot back to -mm3, there's nothing suspicious in /var/log. <b ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-11 22:17 2.5.70-mm8: freeze after starting X Bryan O'Sullivan @ 2003-06-11 22:36 ` Robert Love 2003-06-11 22:41 ` Andrew Morton 1 sibling, 0 replies; 22+ messages in thread From: Robert Love @ 2003-06-11 22:36 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: Andrew Morton, linux-kernel On Wed, 2003-06-11 at 15:17, Bryan O'Sullivan wrote: > I just upgraded from -mm3 (which I'd been running solidly for over a > week) to -mm8, and find that the system freezes hard after I start the X > server. After X starts, lifetime varies from zero to maybe 20 seconds > of app launching, then everything locks up. > > At this point, the machine is still pingable, but daemons like sshd > don't respond, and I can't see any logs. After a reboot back to -mm3, > there's nothing suspicious in /var/log. Same problem here. It started happening in -mm6. I have not narrowed it down to anything suspicious, and because it is in X and my normal desktop machine I have not really debugged the issue. Interrupts are still on, though... Robert Love ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-11 22:17 2.5.70-mm8: freeze after starting X Bryan O'Sullivan 2003-06-11 22:36 ` Robert Love @ 2003-06-11 22:41 ` Andrew Morton 2003-06-11 22:53 ` Bryan O'Sullivan ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Andrew Morton @ 2003-06-11 22:41 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: linux-kernel "Bryan O'Sullivan" <bos@serpentine.com> wrote: > > I just upgraded from -mm3 (which I'd been running solidly for over a > week) to -mm8, and find that the system freezes hard after I start the X > server. After X starts, lifetime varies from zero to maybe 20 seconds > of app launching, then everything locks up. You might try reverting ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch > At this point, the machine is still pingable, but daemons like sshd > don't respond, and I can't see any logs. After a reboot back to -mm3, > there's nothing suspicious in /var/log. Something oopsed I'd say. You using radeon? That seems pretty oopsy lately. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-11 22:41 ` Andrew Morton @ 2003-06-11 22:53 ` Bryan O'Sullivan 2003-06-11 23:11 ` Bryan O'Sullivan 2003-06-11 23:34 ` Robert Love 2 siblings, 0 replies; 22+ messages in thread From: Bryan O'Sullivan @ 2003-06-11 22:53 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-06-11 at 15:41, Andrew Morton wrote: > You might try reverting > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch Will do. > Something oopsed I'd say. You using radeon? That seems pretty oopsy > lately. Yep, R7500. <b ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-11 22:41 ` Andrew Morton 2003-06-11 22:53 ` Bryan O'Sullivan @ 2003-06-11 23:11 ` Bryan O'Sullivan 2003-06-11 23:34 ` Robert Love 2 siblings, 0 replies; 22+ messages in thread From: Bryan O'Sullivan @ 2003-06-11 23:11 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-06-11 at 15:41, Andrew Morton wrote: > You might try reverting > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch No good. <b ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-11 22:41 ` Andrew Morton 2003-06-11 22:53 ` Bryan O'Sullivan 2003-06-11 23:11 ` Bryan O'Sullivan @ 2003-06-11 23:34 ` Robert Love 2003-06-12 0:18 ` Robert Love 2 siblings, 1 reply; 22+ messages in thread From: Robert Love @ 2003-06-11 23:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Bryan O'Sullivan, linux-kernel On Wed, 2003-06-11 at 15:41, Andrew Morton wrote: > You might try reverting > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.70/2.5.70-mm8/broken-out/pci-init-ordering-fix.patch > > Something oopsed I'd say. You using radeon? That seems pretty oopsy > lately. I will debunk both theories: its not Radeon (I have a Matrox) and its not the pci-init-ordering-fix patch (I already tried that). Robert Love ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-11 23:34 ` Robert Love @ 2003-06-12 0:18 ` Robert Love 2003-06-12 0:24 ` Andrew Morton 0 siblings, 1 reply; 22+ messages in thread From: Robert Love @ 2003-06-12 0:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Bryan O'Sullivan, linux-kernel, piggin On Wed, 2003-06-11 at 16:34, Robert Love wrote: > I will debunk both theories: its not Radeon (I have a Matrox) and its > not the pci-init-ordering-fix patch (I already tried that). Ah, it is the anticipatory I/O scheduler. There is a logic thinko somewhere... I have not found it yet, but I have narrowed it down to something which the attached patch fixes (i.e., apply this patch and the problem is fixed). Maybe Nick can see the bug and short circuit my search? The problem is related to the as-autotune-write-batches patch. Robert Love Fix system deadlock in 2.5.70-mm6 and beyond. drivers/block/as-iosched.c | 68 +++++++-------------------------------------- 1 files changed, 11 insertions(+), 57 deletions(-) diff -urN linux-2.5.70-mm8/drivers/block/as-iosched.c linux/drivers/block/as-iosched.c --- linux-2.5.70-mm8/drivers/block/as-iosched.c 2003-06-11 17:12:02.919110360 -0700 +++ linux/drivers/block/as-iosched.c 2003-06-11 17:05:59.000000000 -0700 @@ -52,7 +52,7 @@ * See, the problem is: we can send a lot of writes to disk cache / TCQ in * a short amount of time... */ -#define default_write_batch_expire (HZ / 20) +#define default_write_batch_expire (5) /* * max time we may wait to anticipate a read (default around 6ms) @@ -135,9 +135,6 @@ unsigned long last_check_fifo[2]; int changed_batch; int batch_data_dir; /* current batch REQ_SYNC / REQ_ASYNC */ - int write_batch_count; /* max # of reqs in a write batch */ - int current_write_count; /* how many requests left this batch */ - int write_batch_idled; /* has the write batch gone idle? */ mempool_t *arq_pool; enum anticipation_status antic_status; @@ -938,35 +935,6 @@ } /* - * Gathers timings and resizes the write batch automatically - */ -void update_write_batch(struct as_data *ad) -{ - unsigned long batch = ad->batch_expire[REQ_ASYNC]; - long write_time; - - write_time = (jiffies - ad->current_batch_expires) + batch; - if (write_time < 0) - write_time = 0; - - if (write_time > batch + 5 && !ad->write_batch_idled) { - if (write_time / batch > 2) - ad->write_batch_count /= 2; - else - ad->write_batch_count--; - - } else if (write_time + 5 < batch && ad->current_write_count == 0) { - if (batch / write_time > 2) - ad->write_batch_count *= 2; - else - ad->write_batch_count++; - } - - if (ad->write_batch_count < 1) - ad->write_batch_count = 1; -} - -/* * as_completed_request is to be called when a request has completed and * returned something to the requesting process, be it an error or data. */ @@ -981,7 +949,8 @@ return; } - WARN_ON(blk_fs_request(rq) && arq->state == AS_RQ_NEW); + if (blk_fs_request(rq) && arq->state == AS_RQ_NEW) + printk(KERN_INFO "warning: as_completed_request got bad request\n"); if (arq->state != AS_RQ_DISPATCHED) return; @@ -999,7 +968,6 @@ */ if (ad->batch_data_dir == REQ_SYNC && ad->changed_batch && ad->batch_data_dir == arq->is_sync) { - update_write_batch(ad); ad->current_batch_expires = jiffies + ad->batch_expire[REQ_SYNC]; ad->changed_batch = 0; @@ -1151,11 +1119,10 @@ return 0; if (ad->batch_data_dir == REQ_SYNC) - /* TODO! add a check so a complete fifo gets written? */ - return time_after(jiffies, ad->current_batch_expires); + return time_after(jiffies, ad->current_batch_expires) && + time_after(jiffies, ad->fifo_expire[REQ_SYNC]); - return time_after(jiffies, ad->current_batch_expires) - || ad->current_write_count == 0; + return !ad->current_batch_expires; } /* @@ -1187,9 +1154,8 @@ put_as_io_context(&ad->as_io_context); ad->as_io_context = NULL; } - - if (ad->current_write_count != 0) - ad->current_write_count--; + if (ad->current_batch_expires) + ad->current_batch_expires--; } ad->aic_finished = 0; @@ -1218,12 +1184,6 @@ const int reads = !list_empty(&ad->fifo_list[REQ_SYNC]); const int writes = !list_empty(&ad->fifo_list[REQ_ASYNC]); - /* Signal that the write batch was uncontended, so we can't time it */ - if (ad->batch_data_dir == REQ_ASYNC && !reads) { - if (ad->current_write_count == 0 || !writes) - ad->write_batch_idled = 1; - } - if (!(reads || writes) || ad->antic_status == ANTIC_WAIT_REQ || ad->antic_status == ANTIC_WAIT_NEXT @@ -1288,8 +1248,7 @@ if (ad->batch_data_dir == REQ_SYNC) ad->changed_batch = 1; ad->batch_data_dir = REQ_ASYNC; - ad->current_write_count = ad->write_batch_count; - ad->write_batch_idled = 0; + ad->current_batch_expires = ad->batch_expire[REQ_ASYNC]; arq = ad->next_arq[ad->batch_data_dir]; goto dispatch_request; } @@ -1311,11 +1270,9 @@ if (ad->changed_batch) { if (ad->changed_batch == 1 && ad->nr_dispatched) return 0; - if (ad->batch_data_dir == REQ_ASYNC) { - ad->current_batch_expires = jiffies + - ad->batch_expire[REQ_ASYNC]; + if (ad->changed_batch == 1 && ad->batch_data_dir == REQ_ASYNC) ad->changed_batch = 0; - } else + else ad->changed_batch = 2; arq->request->flags |= REQ_HARDBARRIER; } @@ -1727,9 +1684,6 @@ e->elevator_data = ad; ad->current_batch_expires = jiffies + ad->batch_expire[REQ_SYNC]; - ad->write_batch_count = ad->batch_expire[REQ_ASYNC] / 10; - if (ad->write_batch_count < 2) - ad->write_batch_count = 2; return 0; } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-12 0:18 ` Robert Love @ 2003-06-12 0:24 ` Andrew Morton 2003-06-12 1:10 ` [patch] as-iosched divide by zero fix Robert Love 2003-06-12 17:19 ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan 0 siblings, 2 replies; 22+ messages in thread From: Andrew Morton @ 2003-06-12 0:24 UTC (permalink / raw) To: Robert Love; +Cc: bos, linux-kernel, piggin Robert Love <rml@tech9.net> wrote: > > On Wed, 2003-06-11 at 16:34, Robert Love wrote: > > > I will debunk both theories: its not Radeon (I have a Matrox) and its > > not the pci-init-ordering-fix patch (I already tried that). > > Ah, it is the anticipatory I/O scheduler. > > There is a logic thinko somewhere... I have not found it yet, but I have > narrowed it down to something which the attached patch fixes (i.e., > apply this patch and the problem is fixed). > > Maybe Nick can see the bug and short circuit my search? The problem is > related to the as-autotune-write-batches patch. Do you know what the actual oops is? Odd that starting the X server triggers it. Be interesting if your patch fixes things for Brian. There appear to be several divide-by-zero possibilities in there. A random patch would be: diff -puN drivers/block/as-iosched.c~a drivers/block/as-iosched.c --- 25/drivers/block/as-iosched.c~a Wed Jun 11 17:23:42 2003 +++ 25-akpm/drivers/block/as-iosched.c Wed Jun 11 17:23:42 2003 @@ -950,13 +950,13 @@ void update_write_batch(struct as_data * write_time = 0; if (write_time > batch + 5 && !ad->write_batch_idled) { - if (write_time / batch > 2) + if (batch && (write_time / batch > 2)) ad->write_batch_count /= 2; else ad->write_batch_count--; } else if (write_time + 5 < batch && ad->current_write_count == 0) { - if (batch / write_time > 2) + if (write_time && (batch / write_time > 2)) ad->write_batch_count *= 2; else ad->write_batch_count++; _ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch] as-iosched divide by zero fix 2003-06-12 0:24 ` Andrew Morton @ 2003-06-12 1:10 ` Robert Love 2003-06-12 1:22 ` Andrew Morton ` (2 more replies) 2003-06-12 17:19 ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan 1 sibling, 3 replies; 22+ messages in thread From: Robert Love @ 2003-06-12 1:10 UTC (permalink / raw) To: Andrew Morton; +Cc: bos, linux-kernel, piggin On Wed, 2003-06-11 at 17:24, Andrew Morton wrote: > Do you know what the actual oops is? I got it all figured out now. It is a divide by zero in update_write_batch() called from as_completed_request(). > Odd that starting the X server triggers it. Be interesting if your patch > fixes things for Brian. I reproduced it without X. The divide by zero is on line 959 with the divide by 'write_time'. It can obviously be zero (see line 950). The divide by 'batch' on line 953 seems safe. The correct patch is below. Most important question: why are only some of us seeing this? Robert Love Fix as-iosched divide-by-zero bug. drivers/block/as-iosched.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff -urN linux-2.5.70-mm8/drivers/block/as-iosched.c linux/drivers/block/as-iosched.c --- linux-2.5.70-mm8/drivers/block/as-iosched.c 2003-06-11 17:12:02.000000000 -0700 +++ linux/drivers/block/as-iosched.c 2003-06-11 18:04:15.222619392 -0700 @@ -954,9 +954,9 @@ ad->write_batch_count /= 2; else ad->write_batch_count--; - + } else if (write_time + 5 < batch && ad->current_write_count == 0) { - if (batch / write_time > 2) + if (write_time && (batch / write_time > 2)) ad->write_batch_count *= 2; else ad->write_batch_count++; ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:10 ` [patch] as-iosched divide by zero fix Robert Love @ 2003-06-12 1:22 ` Andrew Morton 2003-06-12 1:28 ` Robert Love 2003-06-12 1:31 ` Nick Piggin 2003-06-12 1:22 ` Nick Piggin 2003-06-12 1:54 ` Steven Cole 2 siblings, 2 replies; 22+ messages in thread From: Andrew Morton @ 2003-06-12 1:22 UTC (permalink / raw) To: Robert Love; +Cc: bos, linux-kernel, piggin Robert Love <rml@tech9.net> wrote: > > Fix as-iosched divide-by-zero bug. hrm, OK. Still not convinced about `batch'. How about this? --- 25/drivers/block/as-iosched.c~as-div-by-zero-fix 2003-06-11 18:17:04.000000000 -0700 +++ 25-akpm/drivers/block/as-iosched.c 2003-06-11 18:20:58.000000000 -0700 @@ -930,13 +930,12 @@ void update_write_batch(struct as_data * write_time = 0; if (write_time > batch + 5 && !ad->write_batch_idled) { - if (write_time / batch > 2) + if (write_time > batch * 2) ad->write_batch_count /= 2; else ad->write_batch_count--; - } else if (write_time + 5 < batch && ad->current_write_count == 0) { - if (batch / write_time > 2) + if (batch > write_time * 2) ad->write_batch_count *= 2; else ad->write_batch_count++; _ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:22 ` Andrew Morton @ 2003-06-12 1:28 ` Robert Love 2003-06-12 1:41 ` Nick Piggin 2003-06-12 1:31 ` Nick Piggin 1 sibling, 1 reply; 22+ messages in thread From: Robert Love @ 2003-06-12 1:28 UTC (permalink / raw) To: Andrew Morton; +Cc: bos, linux-kernel, piggin On Wed, 2003-06-11 at 18:22, Andrew Morton wrote: > hrm, OK. Still not convinced about `batch'. batch is only zero if {read,write}_batch_expire are zero. Nick, is that legal/desirable? Or should we prevent that in the sysfs interface? > - if (write_time / batch > 2) > + if (write_time > batch * 2) > > - if (batch / write_time > 2) > + if (batch > write_time * 2) Much better! :) Robert Love ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:28 ` Robert Love @ 2003-06-12 1:41 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2003-06-12 1:41 UTC (permalink / raw) To: Robert Love; +Cc: Andrew Morton, bos, linux-kernel Robert Love wrote: >On Wed, 2003-06-11 at 18:22, Andrew Morton wrote: > > >>hrm, OK. Still not convinced about `batch'. >> > >batch is only zero if {read,write}_batch_expire are zero. Nick, is that >legal/desirable? Or should we prevent that in the sysfs interface? > Yeah it shouldn't really be allowed, but I think everything will keep working (with this change), even writes. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:22 ` Andrew Morton 2003-06-12 1:28 ` Robert Love @ 2003-06-12 1:31 ` Nick Piggin 2003-06-12 2:34 ` John Stoffel 1 sibling, 1 reply; 22+ messages in thread From: Nick Piggin @ 2003-06-12 1:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Robert Love, bos, linux-kernel Andrew Morton wrote: >Robert Love <rml@tech9.net> wrote: > >>Fix as-iosched divide-by-zero bug. >> > >hrm, OK. Still not convinced about `batch'. > >How about this? > Yeah, thats the way to do it, of course. It was too jumpy at that setting though, so make it batch*3 (or <<1+batch if you don't want the multiply). > >--- 25/drivers/block/as-iosched.c~as-div-by-zero-fix 2003-06-11 18:17:04.000000000 -0700 >+++ 25-akpm/drivers/block/as-iosched.c 2003-06-11 18:20:58.000000000 -0700 >@@ -930,13 +930,12 @@ void update_write_batch(struct as_data * > write_time = 0; > > if (write_time > batch + 5 && !ad->write_batch_idled) { >- if (write_time / batch > 2) >+ if (write_time > batch * 2) > ad->write_batch_count /= 2; > else > ad->write_batch_count--; >- > } else if (write_time + 5 < batch && ad->current_write_count == 0) { >- if (batch / write_time > 2) >+ if (batch > write_time * 2) > ad->write_batch_count *= 2; > else > ad->write_batch_count++; > >_ > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:31 ` Nick Piggin @ 2003-06-12 2:34 ` John Stoffel 2003-06-12 4:05 ` Nick Piggin 0 siblings, 1 reply; 22+ messages in thread From: John Stoffel @ 2003-06-12 2:34 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, Robert Love, bos, linux-kernel Nick> Yeah, thats the way to do it, of course. It was too jumpy at Nick> that setting though, so make it batch*3 (or <<1+batch if you Nick> don't want the multiply). Aren't we trying to get away from magic constants like this? Or at least a better idea of why batch*3 is better than batch*2? I will admit I haven't had the chance to peer into the code, so I'm probably just being stupid (and lazy) here to speak up. I guess the real question I have is what happens if we make it batch*100, how does the affect the algorithm? And if going from 2 to 3 makes such a difference, doesn't that point to a scaling issue, i.e. we should have 200 and 300 here, so we can try out 250 as an intermediate value. *shrug* Just trying to understand... John ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 2:34 ` John Stoffel @ 2003-06-12 4:05 ` Nick Piggin 0 siblings, 0 replies; 22+ messages in thread From: Nick Piggin @ 2003-06-12 4:05 UTC (permalink / raw) To: John Stoffel; +Cc: Andrew Morton, Robert Love, bos, linux-kernel John Stoffel wrote: >Nick> Yeah, thats the way to do it, of course. It was too jumpy at >Nick> that setting though, so make it batch*3 (or <<1+batch if you >Nick> don't want the multiply). > >Aren't we trying to get away from magic constants like this? Or at >least a better idea of why batch*3 is better than batch*2? I will >admit I haven't had the chance to peer into the code, so I'm probably >just being stupid (and lazy) here to speak up. > >I guess the real question I have is what happens if we make it >batch*100, how does the affect the algorithm? And if going from 2 to >3 makes such a difference, doesn't that point to a scaling issue, >i.e. we should have 200 and 300 here, so we can try out 250 as an >intermediate value. > >*shrug* Just trying to understand... > Its not very critical. It is used to estimate how many requests will take t ms to complete, based on how many requests we sent last time around, and how long that took. As long as it increment when we were below estimate and decrement when above it should be OK. The mul/div were put there in order to adapt more quickly to changes in the request pattern. You would think that if a batch of 10 requests took twice as long as we wanted, we might as well submit 5 requests, but in testing I found that this makes the numbers jump around too much, while a 3* threshold smoothed it over while still adapting to changes nicely. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:10 ` [patch] as-iosched divide by zero fix Robert Love 2003-06-12 1:22 ` Andrew Morton @ 2003-06-12 1:22 ` Nick Piggin 2003-06-12 1:29 ` Robert Love 2003-06-12 1:54 ` Steven Cole 2 siblings, 1 reply; 22+ messages in thread From: Nick Piggin @ 2003-06-12 1:22 UTC (permalink / raw) To: Robert Love; +Cc: Andrew Morton, bos, linux-kernel Robert Love wrote: >On Wed, 2003-06-11 at 17:24, Andrew Morton wrote: > > >>Do you know what the actual oops is? >> > >I got it all figured out now. > >It is a divide by zero in update_write_batch() called from >as_completed_request(). > > >>Odd that starting the X server triggers it. Be interesting if your patch >>fixes things for Brian. >> > >I reproduced it without X. > >The divide by zero is on line 959 with the divide by 'write_time'. It >can obviously be zero (see line 950). The divide by 'batch' on line 953 >seems safe. > >The correct patch is below. > Probably put in the other check to be on the safe side. And can the check be if (!write_time || (batch / write_time > 2) > > >Most important question: why are only some of us seeing this? > It would occur if a write batch didn't take any jiffies, which isn't very likely. The HZ=100 change probbly brought it out. Thanks guys. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:22 ` Nick Piggin @ 2003-06-12 1:29 ` Robert Love 0 siblings, 0 replies; 22+ messages in thread From: Robert Love @ 2003-06-12 1:29 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, bos, linux-kernel On Wed, 2003-06-11 at 18:22, Nick Piggin wrote: > It would occur if a write batch didn't take any jiffies, which > isn't very likely. The HZ=100 change probbly brought it out. I reverted that change (yah yah, I am no help). I have seen the problem since mm6, but just today got around to looking into it. Anyhow, its fixed. Robert Love ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:10 ` [patch] as-iosched divide by zero fix Robert Love 2003-06-12 1:22 ` Andrew Morton 2003-06-12 1:22 ` Nick Piggin @ 2003-06-12 1:54 ` Steven Cole 2003-06-12 2:01 ` Robert Love 2 siblings, 1 reply; 22+ messages in thread From: Steven Cole @ 2003-06-12 1:54 UTC (permalink / raw) To: Robert Love; +Cc: Andrew Morton, bos, linux-kernel, piggin On Wed, 2003-06-11 at 19:10, Robert Love wrote: > On Wed, 2003-06-11 at 17:24, Andrew Morton wrote: > > > Do you know what the actual oops is? > > I got it all figured out now. > > It is a divide by zero in update_write_batch() called from > as_completed_request(). > > > Odd that starting the X server triggers it. Be interesting if your patch > > fixes things for Brian. > > I reproduced it without X. > > The divide by zero is on line 959 with the divide by 'write_time'. It > can obviously be zero (see line 950). The divide by 'batch' on line 953 > seems safe. > > The correct patch is below. > > Most important question: why are only some of us seeing this? > > Robert Love With regards to the last, here is an anti-AOL! for the oops. I ran 2.5.70-mm8 for several hours today, doing kernel compiles and running dbench 64 on ext3, xfs, and jfs. No oops. All while running X (although that now seems moot). Base distro is RH9 if that could matter. System is UP (PIII), PREEMPT, IDE, i810 chipset. Steven ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch] as-iosched divide by zero fix 2003-06-12 1:54 ` Steven Cole @ 2003-06-12 2:01 ` Robert Love 0 siblings, 0 replies; 22+ messages in thread From: Robert Love @ 2003-06-12 2:01 UTC (permalink / raw) To: Steven Cole; +Cc: Andrew Morton, bos, linux-kernel, piggin On Wed, 2003-06-11 at 18:54, Steven Cole wrote: > With regards to the last, here is an anti-AOL! for the oops. I ran > 2.5.70-mm8 for several hours today, doing kernel compiles and running > dbench 64 on ext3, xfs, and jfs. No oops. > > All while running X (although that now seems moot). Base distro is RH9 > if that could matter. System is UP (PIII), PREEMPT, IDE, i810 chipset. Right. Most people are not seeing this. I have a system very similar to yours, interestingly. It is just random timings I guess. Robert Love ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-12 0:24 ` Andrew Morton 2003-06-12 1:10 ` [patch] as-iosched divide by zero fix Robert Love @ 2003-06-12 17:19 ` Bryan O'Sullivan 2003-06-12 17:29 ` Bryan O'Sullivan 2003-06-12 17:30 ` Davide Libenzi 1 sibling, 2 replies; 22+ messages in thread From: Bryan O'Sullivan @ 2003-06-12 17:19 UTC (permalink / raw) To: Andrew Morton; +Cc: Robert Love, linux-kernel, piggin On Wed, 2003-06-11 at 17:24, Andrew Morton wrote: > Odd that starting the X server triggers it. Be interesting if your patch > fixes things for Brian. I think Robert and I are seeing different things. For me, -mm6 is fine (unlike Robert's case), -mm7 oopses in the PCI init code during early boot (somewhere in the radeon init stuff, can't capture the oops easily), and -mm8 gives itself a wedgie a few seconds after starting X. I'm about to try, um, whichever of the umpty-ump patches that went back and forth looks most plausible. <b ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-12 17:19 ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan @ 2003-06-12 17:29 ` Bryan O'Sullivan 2003-06-12 17:30 ` Davide Libenzi 1 sibling, 0 replies; 22+ messages in thread From: Bryan O'Sullivan @ 2003-06-12 17:29 UTC (permalink / raw) To: Andrew Morton; +Cc: Robert Love, linux-kernel, piggin On Thu, 2003-06-12 at 10:19, Bryan O'Sullivan wrote: > I'm about to try, um, whichever of the umpty-ump patches that went back > and forth looks most plausible. Tried your two-liner, Andrew, but -mm8 is as freezy as ever. I'll see if I can hook up a serial console and find an oops at some point. <b ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-mm8: freeze after starting X 2003-06-12 17:19 ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan 2003-06-12 17:29 ` Bryan O'Sullivan @ 2003-06-12 17:30 ` Davide Libenzi 1 sibling, 0 replies; 22+ messages in thread From: Davide Libenzi @ 2003-06-12 17:30 UTC (permalink / raw) To: Bryan O'Sullivan Cc: Andrew Morton, Robert Love, Linux Kernel Mailing List, piggin On Thu, 12 Jun 2003, Bryan O'Sullivan wrote: > On Wed, 2003-06-11 at 17:24, Andrew Morton wrote: > > > Odd that starting the X server triggers it. Be interesting if your patch > > fixes things for Brian. > > I think Robert and I are seeing different things. For me, -mm6 is fine > (unlike Robert's case), -mm7 oopses in the PCI init code during early > boot (somewhere in the radeon init stuff, can't capture the oops > easily), and -mm8 gives itself a wedgie a few seconds after starting X. > > I'm about to try, um, whichever of the umpty-ump patches that went back > and forth looks most plausible. I'm having total freezes with 2.5.69 in both my home laptop with a SiS650 chipset and in my machine at work with Intel Corp. 82845G/GL. Using X with Gnome (RH9) the system will end up to a completely frozen state after a random amount of time. This happens with almost no activity on the machine that makes me thing to be not related to some kind of load. IRQ are disabled and the IDE drive light remains on. I planned to debug the thing but I didn't have time yet. I set up the NMI oopser and I need to do something to get the dump since when the NMI trigger I'm in graphic mode. I was thinking about LKCD. It has never happened in console mode so it must be X/Gnome+2.5.69 - Davide ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2003-06-12 17:18 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-06-11 22:17 2.5.70-mm8: freeze after starting X Bryan O'Sullivan 2003-06-11 22:36 ` Robert Love 2003-06-11 22:41 ` Andrew Morton 2003-06-11 22:53 ` Bryan O'Sullivan 2003-06-11 23:11 ` Bryan O'Sullivan 2003-06-11 23:34 ` Robert Love 2003-06-12 0:18 ` Robert Love 2003-06-12 0:24 ` Andrew Morton 2003-06-12 1:10 ` [patch] as-iosched divide by zero fix Robert Love 2003-06-12 1:22 ` Andrew Morton 2003-06-12 1:28 ` Robert Love 2003-06-12 1:41 ` Nick Piggin 2003-06-12 1:31 ` Nick Piggin 2003-06-12 2:34 ` John Stoffel 2003-06-12 4:05 ` Nick Piggin 2003-06-12 1:22 ` Nick Piggin 2003-06-12 1:29 ` Robert Love 2003-06-12 1:54 ` Steven Cole 2003-06-12 2:01 ` Robert Love 2003-06-12 17:19 ` 2.5.70-mm8: freeze after starting X Bryan O'Sullivan 2003-06-12 17:29 ` Bryan O'Sullivan 2003-06-12 17:30 ` Davide Libenzi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox