* [PATCH] block: fix intermittent dm timeout based oops @ 2009-03-24 7:17 Hannes Reinecke 2009-04-03 14:32 ` Christof Schmitt 0 siblings, 1 reply; 6+ messages in thread From: Hannes Reinecke @ 2009-03-24 7:17 UTC (permalink / raw) To: James Bottomley; +Cc: linux-kernel, linux-scsi Very rarely under stress testing of dm, oopses are occuring as something tampers with an old stack frame. This has been traced back to blk_abort_queue() leaving a timeout_list pointing to the stack. The reason is that sometimes blk_abort_request() won't delete the timer (if the request is marked as complete but before the timer has been removed, a small race window). Fix this by splicing back from the ususally empty list to the q->timeout_list. Signed-off-by: Hannes Reinecke <hare@suse.de> --- block/blk-timeout.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/block/blk-timeout.c b/block/blk-timeout.c index bbbdc4b..6213123 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q) list_for_each_entry_safe(rq, tmp, &list, timeout_list) blk_abort_request(rq); + /* + * Occasionally, blk_abort_request() will return without + * deleting the element from the list + */ + list_splice(&list, &q->timeout_list); + spin_unlock_irqrestore(q->queue_lock, flags); } -- 1.5.3.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix intermittent dm timeout based oops 2009-03-24 7:17 [PATCH] block: fix intermittent dm timeout based oops Hannes Reinecke @ 2009-04-03 14:32 ` Christof Schmitt 2009-04-03 14:35 ` James Bottomley 2009-04-03 18:01 ` Jens Axboe 0 siblings, 2 replies; 6+ messages in thread From: Christof Schmitt @ 2009-04-03 14:32 UTC (permalink / raw) To: Jens Axboe; +Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote: > Very rarely under stress testing of dm, oopses are occuring as > something tampers with an old stack frame. This has been traced back > to blk_abort_queue() leaving a timeout_list pointing to the stack. > The reason is that sometimes blk_abort_request() won't delete the > timer (if the request is marked as complete but before the timer has > been removed, a small race window). Fix this by splicing back from > the ususally empty list to the q->timeout_list. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > block/blk-timeout.c | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/block/blk-timeout.c b/block/blk-timeout.c > index bbbdc4b..6213123 100644 > --- a/block/blk-timeout.c > +++ b/block/blk-timeout.c > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q) > list_for_each_entry_safe(rq, tmp, &list, timeout_list) > blk_abort_request(rq); > > + /* > + * Occasionally, blk_abort_request() will return without > + * deleting the element from the list > + */ > + list_splice(&list, &q->timeout_list); > + > spin_unlock_irqrestore(q->queue_lock, flags); > > } > -- > 1.5.3.2 I just noticed that this fix is not upstream yet and i have seen test cases hitting this problem. Jens, are you going to included this patch, or should this go through the SCSI tree? -- Christof Schmitt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix intermittent dm timeout based oops 2009-04-03 14:32 ` Christof Schmitt @ 2009-04-03 14:35 ` James Bottomley 2009-04-03 18:01 ` Jens Axboe 1 sibling, 0 replies; 6+ messages in thread From: James Bottomley @ 2009-04-03 14:35 UTC (permalink / raw) To: Christof Schmitt; +Cc: Jens Axboe, linux-kernel, linux-scsi, Hannes Reinecke On Fri, 2009-04-03 at 16:32 +0200, Christof Schmitt wrote: > On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote: > > Very rarely under stress testing of dm, oopses are occuring as > > something tampers with an old stack frame. This has been traced back > > to blk_abort_queue() leaving a timeout_list pointing to the stack. > > The reason is that sometimes blk_abort_request() won't delete the > > timer (if the request is marked as complete but before the timer has > > been removed, a small race window). Fix this by splicing back from > > the ususally empty list to the q->timeout_list. > > > > Signed-off-by: Hannes Reinecke <hare@suse.de> > > --- > > block/blk-timeout.c | 6 ++++++ > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > diff --git a/block/blk-timeout.c b/block/blk-timeout.c > > index bbbdc4b..6213123 100644 > > --- a/block/blk-timeout.c > > +++ b/block/blk-timeout.c > > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q) > > list_for_each_entry_safe(rq, tmp, &list, timeout_list) > > blk_abort_request(rq); > > > > + /* > > + * Occasionally, blk_abort_request() will return without > > + * deleting the element from the list > > + */ > > + list_splice(&list, &q->timeout_list); > > + > > spin_unlock_irqrestore(q->queue_lock, flags); > > > > } > > -- > > 1.5.3.2 > > I just noticed that this fix is not upstream yet and i have seen test > cases hitting this problem. > > Jens, are you going to included this patch, or should this go through > the SCSI tree? It's a block patch, so it goes through the block tree ... it also needs backporting to stable. James ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix intermittent dm timeout based oops 2009-04-03 14:32 ` Christof Schmitt 2009-04-03 14:35 ` James Bottomley @ 2009-04-03 18:01 ` Jens Axboe 2009-04-23 8:21 ` Christof Schmitt 1 sibling, 1 reply; 6+ messages in thread From: Jens Axboe @ 2009-04-03 18:01 UTC (permalink / raw) To: Christof Schmitt Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke On Fri, Apr 03 2009, Christof Schmitt wrote: > On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote: > > Very rarely under stress testing of dm, oopses are occuring as > > something tampers with an old stack frame. This has been traced back > > to blk_abort_queue() leaving a timeout_list pointing to the stack. > > The reason is that sometimes blk_abort_request() won't delete the > > timer (if the request is marked as complete but before the timer has > > been removed, a small race window). Fix this by splicing back from > > the ususally empty list to the q->timeout_list. > > > > Signed-off-by: Hannes Reinecke <hare@suse.de> > > --- > > block/blk-timeout.c | 6 ++++++ > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > diff --git a/block/blk-timeout.c b/block/blk-timeout.c > > index bbbdc4b..6213123 100644 > > --- a/block/blk-timeout.c > > +++ b/block/blk-timeout.c > > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q) > > list_for_each_entry_safe(rq, tmp, &list, timeout_list) > > blk_abort_request(rq); > > > > + /* > > + * Occasionally, blk_abort_request() will return without > > + * deleting the element from the list > > + */ > > + list_splice(&list, &q->timeout_list); > > + > > spin_unlock_irqrestore(q->queue_lock, flags); > > > > } > > -- > > 1.5.3.2 > > I just noticed that this fix is not upstream yet and i have seen test > cases hitting this problem. > > Jens, are you going to included this patch, or should this go through > the SCSI tree? I will include it, and CC stable as well. -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix intermittent dm timeout based oops 2009-04-03 18:01 ` Jens Axboe @ 2009-04-23 8:21 ` Christof Schmitt 2009-04-23 8:31 ` Jens Axboe 0 siblings, 1 reply; 6+ messages in thread From: Christof Schmitt @ 2009-04-23 8:21 UTC (permalink / raw) To: Jens Axboe; +Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke On Fri, Apr 03, 2009 at 08:01:06PM +0200, Jens Axboe wrote: > On Fri, Apr 03 2009, Christof Schmitt wrote: > > On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote: > > > Very rarely under stress testing of dm, oopses are occuring as > > > something tampers with an old stack frame. This has been traced back > > > to blk_abort_queue() leaving a timeout_list pointing to the stack. > > > The reason is that sometimes blk_abort_request() won't delete the > > > timer (if the request is marked as complete but before the timer has > > > been removed, a small race window). Fix this by splicing back from > > > the ususally empty list to the q->timeout_list. > > > > > > Signed-off-by: Hannes Reinecke <hare@suse.de> > > > --- > > > block/blk-timeout.c | 6 ++++++ > > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > > > diff --git a/block/blk-timeout.c b/block/blk-timeout.c > > > index bbbdc4b..6213123 100644 > > > --- a/block/blk-timeout.c > > > +++ b/block/blk-timeout.c > > > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q) > > > list_for_each_entry_safe(rq, tmp, &list, timeout_list) > > > blk_abort_request(rq); > > > > > > + /* > > > + * Occasionally, blk_abort_request() will return without > > > + * deleting the element from the list > > > + */ > > > + list_splice(&list, &q->timeout_list); > > > + > > > spin_unlock_irqrestore(q->queue_lock, flags); > > > > > > } > > > -- > > > 1.5.3.2 > > > > I just noticed that this fix is not upstream yet and i have seen test > > cases hitting this problem. > > > > Jens, are you going to included this patch, or should this go through > > the SCSI tree? > > I will include it, and CC stable as well. Any update on this? 2.6.30-rc3 does not have the patch. -- Christof Schmitt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix intermittent dm timeout based oops 2009-04-23 8:21 ` Christof Schmitt @ 2009-04-23 8:31 ` Jens Axboe 0 siblings, 0 replies; 6+ messages in thread From: Jens Axboe @ 2009-04-23 8:31 UTC (permalink / raw) To: Christof Schmitt Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke On Thu, Apr 23 2009, Christof Schmitt wrote: > On Fri, Apr 03, 2009 at 08:01:06PM +0200, Jens Axboe wrote: > > On Fri, Apr 03 2009, Christof Schmitt wrote: > > > On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote: > > > > Very rarely under stress testing of dm, oopses are occuring as > > > > something tampers with an old stack frame. This has been traced back > > > > to blk_abort_queue() leaving a timeout_list pointing to the stack. > > > > The reason is that sometimes blk_abort_request() won't delete the > > > > timer (if the request is marked as complete but before the timer has > > > > been removed, a small race window). Fix this by splicing back from > > > > the ususally empty list to the q->timeout_list. > > > > > > > > Signed-off-by: Hannes Reinecke <hare@suse.de> > > > > --- > > > > block/blk-timeout.c | 6 ++++++ > > > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > > > > > diff --git a/block/blk-timeout.c b/block/blk-timeout.c > > > > index bbbdc4b..6213123 100644 > > > > --- a/block/blk-timeout.c > > > > +++ b/block/blk-timeout.c > > > > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q) > > > > list_for_each_entry_safe(rq, tmp, &list, timeout_list) > > > > blk_abort_request(rq); > > > > > > > > + /* > > > > + * Occasionally, blk_abort_request() will return without > > > > + * deleting the element from the list > > > > + */ > > > > + list_splice(&list, &q->timeout_list); > > > > + > > > > spin_unlock_irqrestore(q->queue_lock, flags); > > > > > > > > } > > > > -- > > > > 1.5.3.2 > > > > > > I just noticed that this fix is not upstream yet and i have seen test > > > cases hitting this problem. > > > > > > Jens, are you going to included this patch, or should this go through > > > the SCSI tree? > > > > I will include it, and CC stable as well. > > Any update on this? 2.6.30-rc3 does not have the patch. I'll be sure to include it today, I need to fix one more thing before sending a new pull request. -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-04-23 8:31 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-24 7:17 [PATCH] block: fix intermittent dm timeout based oops Hannes Reinecke 2009-04-03 14:32 ` Christof Schmitt 2009-04-03 14:35 ` James Bottomley 2009-04-03 18:01 ` Jens Axboe 2009-04-23 8:21 ` Christof Schmitt 2009-04-23 8:31 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).