* [PATCH] block: Free queue resources at blk_release_queue() @ 2011-09-22 13:18 Hannes Reinecke 2011-09-28 0:47 ` Jens Axboe 0 siblings, 1 reply; 61+ messages in thread From: Hannes Reinecke @ 2011-09-22 13:18 UTC (permalink / raw) To: James Bottomley; +Cc: jaxboe, linux-scsi A kernel crash is observed when a mounted ext3/ext4 filesystem is physically removed. The problem is that blk_cleanup_queue() frees up some resources eg by calling elevator_exit(), which are not checked for in normal operation. So we should rather move these calls to the destructor function blk_release_queue() as at that point all remaining references are gone. However, in doing so we have to ensure that any externally supplied queue_lock is disconnected as the driver might free up the lock after the call of blk_cleanup_queue(), Signed-off-by: Hannes Reinecke <hare@suse.de> diff --git a/block/blk-core.c b/block/blk-core.c index 90e1ffd..81a431b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -348,9 +348,10 @@ void blk_put_queue(struct request_queue *q) EXPORT_SYMBOL(blk_put_queue); /* - * Note: If a driver supplied the queue lock, it should not zap that lock - * unexpectedly as some queue cleanup components like elevator_exit() and - * blk_throtl_exit() need queue lock. + * Note: If a driver supplied the queue lock, it is disconnected + * by this function. The actual state of the lock doesn't matter + * here as the request_queue isn't accessible after this point + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. */ void blk_cleanup_queue(struct request_queue *q) { @@ -367,10 +368,8 @@ void blk_cleanup_queue(struct request_queue *q) queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); mutex_unlock(&q->sysfs_lock); - if (q->elevator) - elevator_exit(q->elevator); - - blk_throtl_exit(q); + if (q->queue_lock != &q->__queue_lock) + q->queue_lock = &q->__queue_lock; blk_put_queue(q); } diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 0ee17b5..a5a756b 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj) blk_sync_queue(q); + if (q->elevator) + elevator_exit(q->elevator); + + blk_throtl_exit(q); + if (rl->rq_pool) mempool_destroy(rl->rq_pool); ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-22 13:18 [PATCH] block: Free queue resources at blk_release_queue() Hannes Reinecke @ 2011-09-28 0:47 ` Jens Axboe 2011-09-28 0:55 ` Linus Torvalds 0 siblings, 1 reply; 61+ messages in thread From: Jens Axboe @ 2011-09-28 0:47 UTC (permalink / raw) To: Hannes Reinecke Cc: James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel, Linus Torvalds On Thu, Sep 22 2011, Hannes Reinecke wrote: > > A kernel crash is observed when a mounted ext3/ext4 filesystem > is physically removed. > The problem is that blk_cleanup_queue() frees up some resources > eg by calling elevator_exit(), which are not checked for in > normal operation. > So we should rather move these calls to the destructor function > blk_release_queue() as at that point all remaining references > are gone. > However, in doing so we have to ensure that any externally > supplied queue_lock is disconnected as the driver might > free up the lock after the call of blk_cleanup_queue(), Do we have acks from the people who saw this oops, that this really does fix it for them? I ask since this fix is vastly cleaner than the other patch floating around, so I'd much rather pull this one in. -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-28 0:47 ` Jens Axboe @ 2011-09-28 0:55 ` Linus Torvalds 2011-09-28 1:15 ` Jens Axboe 0 siblings, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2011-09-28 0:55 UTC (permalink / raw) To: Jens Axboe Cc: Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Tue, Sep 27, 2011 at 5:47 PM, Jens Axboe <axboe@kernel.dk> wrote: > > Do we have acks from the people who saw this oops, that this really does > fix it for them? I ask since this fix is vastly cleaner than the other > patch floating around, so I'd much rather pull this one in. I don't think I can't test it, since it seems to be some hard-to-hit race condition. It certainly doesn't seem to happen all the time - today was the first time I've seen it on one of my computers. But if you forward the actual patch to me (the one I see on lkml seems to be broken and doesn't compile cleanly because it's assiging a structure to a pointer), I'll try it out anyway. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-28 0:55 ` Linus Torvalds @ 2011-09-28 1:15 ` Jens Axboe 2011-09-28 1:59 ` Linus Torvalds 0 siblings, 1 reply; 61+ messages in thread From: Jens Axboe @ 2011-09-28 1:15 UTC (permalink / raw) To: Linus Torvalds Cc: Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On 2011-09-27 18:55, Linus Torvalds wrote: > On Tue, Sep 27, 2011 at 5:47 PM, Jens Axboe <axboe@kernel.dk> wrote: >> >> Do we have acks from the people who saw this oops, that this really does >> fix it for them? I ask since this fix is vastly cleaner than the other >> patch floating around, so I'd much rather pull this one in. > > I don't think I can't test it, since it seems to be some hard-to-hit > race condition. It certainly doesn't seem to happen all the time - > today was the first time I've seen it on one of my computers. > > But if you forward the actual patch to me (the one I see on lkml seems > to be broken and doesn't compile cleanly because it's assiging a > structure to a pointer), I'll try it out anyway. Thanks, that would be great. It's inlined below. diff --git a/block/blk-core.c b/block/blk-core.c index b2ed78a..d34433a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -348,9 +348,10 @@ void blk_put_queue(struct request_queue *q) EXPORT_SYMBOL(blk_put_queue); /* - * Note: If a driver supplied the queue lock, it should not zap that lock - * unexpectedly as some queue cleanup components like elevator_exit() and - * blk_throtl_exit() need queue lock. + * Note: If a driver supplied the queue lock, it is disconnected + * by this function. The actual state of the lock doesn't matter + * here as the request_queue isn't accessible after this point + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. */ void blk_cleanup_queue(struct request_queue *q) { @@ -367,10 +368,8 @@ void blk_cleanup_queue(struct request_queue *q) queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); mutex_unlock(&q->sysfs_lock); - if (q->elevator) - elevator_exit(q->elevator); - - blk_throtl_exit(q); + if (q->queue_lock != &q->__queue_lock) + q->queue_lock = &q->__queue_lock; blk_put_queue(q); } diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index e681805..60fda88 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -479,6 +479,11 @@ static void blk_release_queue(struct kobject *kobj) blk_sync_queue(q); + if (q->elevator) + elevator_exit(q->elevator); + + blk_throtl_exit(q); + if (rl->rq_pool) mempool_destroy(rl->rq_pool); -- Jens Axboe ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-28 1:15 ` Jens Axboe @ 2011-09-28 1:59 ` Linus Torvalds 2011-09-28 2:02 ` Jens Axboe 2011-09-28 4:10 ` James Bottomley 0 siblings, 2 replies; 61+ messages in thread From: Linus Torvalds @ 2011-09-28 1:59 UTC (permalink / raw) To: Jens Axboe Cc: Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Tue, Sep 27, 2011 at 6:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >> >> But if you forward the actual patch to me (the one I see on lkml seems >> to be broken and doesn't compile cleanly because it's assiging a >> structure to a pointer), I'll try it out anyway. > > Thanks, that would be great. It's inlined below. Well, I did several USB eject events, and nothing bad happened. But as mentioned, I don't think that means much. Have you tried this with slub debugging and poisoning? It might be worth some extra testing that way. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-28 1:59 ` Linus Torvalds @ 2011-09-28 2:02 ` Jens Axboe 2011-09-28 4:10 ` James Bottomley 1 sibling, 0 replies; 61+ messages in thread From: Jens Axboe @ 2011-09-28 2:02 UTC (permalink / raw) To: Linus Torvalds Cc: Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On 2011-09-27 19:59, Linus Torvalds wrote: > On Tue, Sep 27, 2011 at 6:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >>> >>> But if you forward the actual patch to me (the one I see on lkml seems >>> to be broken and doesn't compile cleanly because it's assiging a >>> structure to a pointer), I'll try it out anyway. >> >> Thanks, that would be great. It's inlined below. > > Well, I did several USB eject events, and nothing bad happened. > > But as mentioned, I don't think that means much. Have you tried this > with slub debugging and poisoning? It might be worth some extra > testing that way. I'm traveling at the moment, and don't have a stick handy here. I'll try it tomorrow morning, before and after (both with pagealloc etc debuggin on). -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-28 1:59 ` Linus Torvalds 2011-09-28 2:02 ` Jens Axboe @ 2011-09-28 4:10 ` James Bottomley 2011-09-28 14:08 ` Jens Axboe 1 sibling, 1 reply; 61+ messages in thread From: James Bottomley @ 2011-09-28 4:10 UTC (permalink / raw) To: Linus Torvalds Cc: Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Tue, 2011-09-27 at 18:59 -0700, Linus Torvalds wrote: > On Tue, Sep 27, 2011 at 6:15 PM, Jens Axboe <axboe@kernel.dk> wrote: > >> > >> But if you forward the actual patch to me (the one I see on lkml seems > >> to be broken and doesn't compile cleanly because it's assiging a > >> structure to a pointer), I'll try it out anyway. > > > > Thanks, that would be great. It's inlined below. > > Well, I did several USB eject events, and nothing bad happened. > > But as mentioned, I don't think that means much. Have you tried this > with slub debugging and poisoning? It might be worth some extra > testing that way. I've been testing a simpler version: http://marc.info/?l=linux-kernel&m=131300594629839 For a long time now with great success. The only difference is the locking cleanup, but SCSI doesn't need that since it doesn't supply its own lock, so I've been voting for this as the final fix for a while now. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-28 4:10 ` James Bottomley @ 2011-09-28 14:08 ` Jens Axboe 2011-09-28 14:11 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Jens Axboe @ 2011-09-28 14:08 UTC (permalink / raw) To: James Bottomley Cc: Linus Torvalds, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On 2011-09-27 22:10, James Bottomley wrote: > On Tue, 2011-09-27 at 18:59 -0700, Linus Torvalds wrote: >> On Tue, Sep 27, 2011 at 6:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >>>> >>>> But if you forward the actual patch to me (the one I see on lkml seems >>>> to be broken and doesn't compile cleanly because it's assiging a >>>> structure to a pointer), I'll try it out anyway. >>> >>> Thanks, that would be great. It's inlined below. >> >> Well, I did several USB eject events, and nothing bad happened. >> >> But as mentioned, I don't think that means much. Have you tried this >> with slub debugging and poisoning? It might be worth some extra >> testing that way. > > I've been testing a simpler version: > > http://marc.info/?l=linux-kernel&m=131300594629839 > > For a long time now with great success. The only difference is the > locking cleanup, but SCSI doesn't need that since it doesn't supply its > own lock, so I've been voting for this as the final fix for a while now. The locking cleanup looks good, though, for devices that do use the embedded lock. But functionally they should be the same for SCSI, so if you had great success with it, then that's a good data point. -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] block: Free queue resources at blk_release_queue() 2011-09-28 14:08 ` Jens Axboe @ 2011-09-28 14:11 ` James Bottomley 2011-09-28 14:14 ` [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) Jens Axboe 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2011-09-28 14:11 UTC (permalink / raw) To: Jens Axboe Cc: Linus Torvalds, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, 2011-09-28 at 08:08 -0600, Jens Axboe wrote: > On 2011-09-27 22:10, James Bottomley wrote: > > On Tue, 2011-09-27 at 18:59 -0700, Linus Torvalds wrote: > >> On Tue, Sep 27, 2011 at 6:15 PM, Jens Axboe <axboe@kernel.dk> wrote: > >>>> > >>>> But if you forward the actual patch to me (the one I see on lkml seems > >>>> to be broken and doesn't compile cleanly because it's assiging a > >>>> structure to a pointer), I'll try it out anyway. > >>> > >>> Thanks, that would be great. It's inlined below. > >> > >> Well, I did several USB eject events, and nothing bad happened. > >> > >> But as mentioned, I don't think that means much. Have you tried this > >> with slub debugging and poisoning? It might be worth some extra > >> testing that way. > > > > I've been testing a simpler version: > > > > http://marc.info/?l=linux-kernel&m=131300594629839 > > > > For a long time now with great success. The only difference is the > > locking cleanup, but SCSI doesn't need that since it doesn't supply its > > own lock, so I've been voting for this as the final fix for a while now. > > The locking cleanup looks good, though, for devices that do use the > embedded lock. Exactly ... it's the missing piece; without it my patch is really only addressing SCSI. > But functionally they should be the same for SCSI, so if > you had great success with it, then that's a good data point. Right, I've run it through the memory debugger and USB ejection testing (with ext2, which seems to be the fs that triggers this problem the most). James ^ permalink raw reply [flat|nested] 61+ messages in thread
* [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 14:11 ` James Bottomley @ 2011-09-28 14:14 ` Jens Axboe 2011-09-28 15:22 ` Linus Torvalds 2011-10-13 13:09 ` Steffen Maier 0 siblings, 2 replies; 61+ messages in thread From: Jens Axboe @ 2011-09-28 14:14 UTC (permalink / raw) To: James Bottomley Cc: Linus Torvalds, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On 2011-09-28 08:11, James Bottomley wrote: > On Wed, 2011-09-28 at 08:08 -0600, Jens Axboe wrote: >> On 2011-09-27 22:10, James Bottomley wrote: >>> On Tue, 2011-09-27 at 18:59 -0700, Linus Torvalds wrote: >>>> On Tue, Sep 27, 2011 at 6:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >>>>>> >>>>>> But if you forward the actual patch to me (the one I see on lkml seems >>>>>> to be broken and doesn't compile cleanly because it's assiging a >>>>>> structure to a pointer), I'll try it out anyway. >>>>> >>>>> Thanks, that would be great. It's inlined below. >>>> >>>> Well, I did several USB eject events, and nothing bad happened. >>>> >>>> But as mentioned, I don't think that means much. Have you tried this >>>> with slub debugging and poisoning? It might be worth some extra >>>> testing that way. >>> >>> I've been testing a simpler version: >>> >>> http://marc.info/?l=linux-kernel&m=131300594629839 >>> >>> For a long time now with great success. The only difference is the >>> locking cleanup, but SCSI doesn't need that since it doesn't supply its >>> own lock, so I've been voting for this as the final fix for a while now. >> >> The locking cleanup looks good, though, for devices that do use the >> embedded lock. > > Exactly ... it's the missing piece; without it my patch is really only > addressing SCSI. > >> But functionally they should be the same for SCSI, so if >> you had great success with it, then that's a good data point. > > Right, I've run it through the memory debugger and USB ejection testing > (with ext2, which seems to be the fs that triggers this problem the > most). Alright, lets call this fully tested and fixed then. Linus, I committed it, please pull: git://git.kernel.dk/linux-block.git for-linus Hannes Reinecke (1): block: Free queue resources at blk_release_queue() block/blk-core.c | 13 ++++++------- block/blk-sysfs.c | 5 +++++ 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index b2ed78a..d34433a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -348,9 +348,10 @@ void blk_put_queue(struct request_queue *q) EXPORT_SYMBOL(blk_put_queue); /* - * Note: If a driver supplied the queue lock, it should not zap that lock - * unexpectedly as some queue cleanup components like elevator_exit() and - * blk_throtl_exit() need queue lock. + * Note: If a driver supplied the queue lock, it is disconnected + * by this function. The actual state of the lock doesn't matter + * here as the request_queue isn't accessible after this point + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. */ void blk_cleanup_queue(struct request_queue *q) { @@ -367,10 +368,8 @@ void blk_cleanup_queue(struct request_queue *q) queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); mutex_unlock(&q->sysfs_lock); - if (q->elevator) - elevator_exit(q->elevator); - - blk_throtl_exit(q); + if (q->queue_lock != &q->__queue_lock) + q->queue_lock = &q->__queue_lock; blk_put_queue(q); } diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index e681805..60fda88 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -479,6 +479,11 @@ static void blk_release_queue(struct kobject *kobj) blk_sync_queue(q); + if (q->elevator) + elevator_exit(q->elevator); + + blk_throtl_exit(q); + if (rl->rq_pool) mempool_destroy(rl->rq_pool); -- Jens Axboe ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 14:14 ` [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) Jens Axboe @ 2011-09-28 15:22 ` Linus Torvalds 2011-09-28 15:43 ` James Bottomley 2011-10-13 13:09 ` Steffen Maier 1 sibling, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2011-09-28 15:22 UTC (permalink / raw) To: Jens Axboe Cc: James Bottomley, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 7:14 AM, Jens Axboe <axboe@kernel.dk> wrote: > > /* > - * Note: If a driver supplied the queue lock, it should not zap that lock > - * unexpectedly as some queue cleanup components like elevator_exit() and > - * blk_throtl_exit() need queue lock. > + * Note: If a driver supplied the queue lock, it is disconnected > + * by this function. The actual state of the lock doesn't matter > + * here as the request_queue isn't accessible after this point > + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. > */ So quite frankly, I just don't believe in that comment. If no more requests will be queued or completed, then the queue lock is irrelevant and should not be changed. More importantly, if no more requests are queued or completed after blk_cleanup_queue(), then we wouldn't have had the bug that we clearly had with the elevator accesses, now would we? So the comment seems to be obviously bogus and wrong. I pulled this, but I think the "just move the teardown" would have been the safer option. What happens if a request completes on another CPU just as we are changing locks, and we lock one lock and then unlock another?! Linus -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 15:22 ` Linus Torvalds @ 2011-09-28 15:43 ` James Bottomley 2011-09-28 17:48 ` Vivek Goyal 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2011-09-28 15:43 UTC (permalink / raw) To: Linus Torvalds Cc: Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, 2011-09-28 at 08:22 -0700, Linus Torvalds wrote: > On Wed, Sep 28, 2011 at 7:14 AM, Jens Axboe <axboe@kernel.dk> wrote: > > > > /* > > - * Note: If a driver supplied the queue lock, it should not zap that lock > > - * unexpectedly as some queue cleanup components like elevator_exit() and > > - * blk_throtl_exit() need queue lock. > > + * Note: If a driver supplied the queue lock, it is disconnected > > + * by this function. The actual state of the lock doesn't matter > > + * here as the request_queue isn't accessible after this point > > + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. > > */ > > So quite frankly, I just don't believe in that comment. > > If no more requests will be queued or completed, then the queue lock > is irrelevant and should not be changed. That was my original argument for my patch. I lost it because you can still hold a queue reference in the sysfs code for block, which means that the put in blk_cleanup_queue() won't be the final one and you'll get a use after free of the lock when the sysfs directory is exited because we take the lock again as we destroy the elevator. > More importantly, if no more requests are queued or completed after > blk_cleanup_queue(), then we wouldn't have had the bug that we clearly > had with the elevator accesses, now would we? So the comment seems to > be obviously bogus and wrong. So this I agree with. blk_cleanup_queue() prevents any new access to the queue, but we still have the old reference holders to contend with. They can submit requests, although we try to error them again with the queue guards check. > I pulled this, but I think the "just move the teardown" would have > been the safer option. What happens if a request completes on another > CPU just as we are changing locks, and we lock one lock and then > unlock another?! The only code for which this could be true is code where we use the block supplied lock, so effectively it never changes. The drivers which supply their own lock are supposed to have already ensured that the queue is unused. I don't really believe this given the sysfs example above, but this fix is no worse than the use after free that would have resulted with the previous code. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 15:43 ` James Bottomley @ 2011-09-28 17:48 ` Vivek Goyal 2011-09-28 17:53 ` Christoph Hellwig 2011-09-28 17:59 ` James Bottomley 0 siblings, 2 replies; 61+ messages in thread From: Vivek Goyal @ 2011-09-28 17:48 UTC (permalink / raw) To: James Bottomley Cc: Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 10:43:36AM -0500, James Bottomley wrote: > On Wed, 2011-09-28 at 08:22 -0700, Linus Torvalds wrote: > > On Wed, Sep 28, 2011 at 7:14 AM, Jens Axboe <axboe@kernel.dk> wrote: > > > > > > /* > > > - * Note: If a driver supplied the queue lock, it should not zap that lock > > > - * unexpectedly as some queue cleanup components like elevator_exit() and > > > - * blk_throtl_exit() need queue lock. > > > + * Note: If a driver supplied the queue lock, it is disconnected > > > + * by this function. The actual state of the lock doesn't matter > > > + * here as the request_queue isn't accessible after this point > > > + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. > > > */ > > > > So quite frankly, I just don't believe in that comment. > > > > If no more requests will be queued or completed, then the queue lock > > is irrelevant and should not be changed. > > That was my original argument for my patch. I lost it because you can > still hold a queue reference in the sysfs code for block, which means > that the put in blk_cleanup_queue() won't be the final one and you'll > get a use after free of the lock when the sysfs directory is exited > because we take the lock again as we destroy the elevator. > > > More importantly, if no more requests are queued or completed after > > blk_cleanup_queue(), then we wouldn't have had the bug that we clearly > > had with the elevator accesses, now would we? So the comment seems to > > be obviously bogus and wrong. > > So this I agree with. blk_cleanup_queue() prevents any new access to > the queue, but we still have the old reference holders to contend with. > They can submit requests, although we try to error them again with the > queue guards check. > > > I pulled this, but I think the "just move the teardown" would have > > been the safer option. What happens if a request completes on another > > CPU just as we are changing locks, and we lock one lock and then > > unlock another?! > > The only code for which this could be true is code where we use the > block supplied lock, so effectively it never changes. > The drivers which supply their own lock are supposed to have already > ensured that the queue is unused. Hi James, For my education purposes, how will driver come to know that queue is unused? Does it happen by checking if any requsts are queued or not? If yes, we might run into issues with throttling logic. For example, if some bio have been throttled and are queued in some data structures on queue. In that case driver does not even know that some bios are queued and will be submitted later. Now if driver calls blk_cleanup_queue() it might happen that throttling related worker is already queue lock and trying to do some housekeeping or trying to dispatch bio etc. Now if queue lock is swapped, it will just cause all the kind of issues. I am wondering if we should retain blk_throtl_exit() in blk_cleanup_queue() before lock swap and just move elevator cleanup in blk_release_queue(). A note to myself, I should probably enhance blk_throtl_exit() to look for any queued throttled bio and single their completion with error (-ENODEV) or something like that. Thanks Vivek ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 17:48 ` Vivek Goyal @ 2011-09-28 17:53 ` Christoph Hellwig 2011-09-28 18:09 ` Vivek Goyal 2011-09-28 17:59 ` James Bottomley 1 sibling, 1 reply; 61+ messages in thread From: Christoph Hellwig @ 2011-09-28 17:53 UTC (permalink / raw) To: Vivek Goyal Cc: James Bottomley, Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 01:48:59PM -0400, Vivek Goyal wrote: > I am wondering if we should retain blk_throtl_exit() in blk_cleanup_queue() > before lock swap and just move elevator cleanup in blk_release_queue(). > > A note to myself, I should probably enhance blk_throtl_exit() to look for any > queued throttled bio and single their completion with error (-ENODEV) or > something like that. The root of this evil is how queue_lock is implemented and (ab)used. Instead of letting the driver assign a pointer to make the core use its locks we really need to make the queue_lock a lock embedded directly into the queue, and drivers may or may not use that lock for their internal data structures. For high performance drivers they preferable should use their own locks as queue_lock is far too contended already for any high IOPS device. The same applies to throtteling btw - instead of overloading an already highly contended lock it really should have its own. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 17:53 ` Christoph Hellwig @ 2011-09-28 18:09 ` Vivek Goyal 2011-09-28 18:16 ` Christoph Hellwig 0 siblings, 1 reply; 61+ messages in thread From: Vivek Goyal @ 2011-09-28 18:09 UTC (permalink / raw) To: Christoph Hellwig Cc: James Bottomley, Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 01:53:04PM -0400, Christoph Hellwig wrote: > On Wed, Sep 28, 2011 at 01:48:59PM -0400, Vivek Goyal wrote: > > I am wondering if we should retain blk_throtl_exit() in blk_cleanup_queue() > > before lock swap and just move elevator cleanup in blk_release_queue(). > > > > A note to myself, I should probably enhance blk_throtl_exit() to look for any > > queued throttled bio and single their completion with error (-ENODEV) or > > something like that. > > The root of this evil is how queue_lock is implemented and (ab)used. > Instead of letting the driver assign a pointer to make the core use > its locks we really need to make the queue_lock a lock embedded directly > into the queue, and drivers may or may not use that lock for their > internal data structures. For high performance drivers they preferable > should use their own locks as queue_lock is far too contended already > for any high IOPS device. The same applies to throtteling btw - > instead of overloading an already highly contended lock it really > should have its own. I had thought of implementing a separate lock for throttling. Then I noticed few operations like checking for queue flags where I would be required to hold queue locks. So I could do lock nesting. hold throttling_lock; hold queue_lock; IIRC, I also had noticed some operations where queue might want to call into throttling with queue lock held and that would have led to lock order problems. So I had given up on the idea and continued to use queue lock for throttling. Thought it could still probably be done if one could justify additional complexity. Thanks Vivek ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 18:09 ` Vivek Goyal @ 2011-09-28 18:16 ` Christoph Hellwig 2011-09-28 19:05 ` Eric Seppanen 2011-09-28 22:34 ` Vivek Goyal 0 siblings, 2 replies; 61+ messages in thread From: Christoph Hellwig @ 2011-09-28 18:16 UTC (permalink / raw) To: Vivek Goyal Cc: James Bottomley, Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 02:09:05PM -0400, Vivek Goyal wrote: > I had thought of implementing a separate lock for throttling. Then I > noticed few operations like checking for queue flags where I would > be required to hold queue locks. Can you do a writeup of these issues? E.g. if the flags are throtteling specific we can move them to a separate flags field, if not we can see if we can make them atomic or similar. Right now on high iops device queue_lock is the major killer for performance. It's one major reason (*) why a lot of the high iops devices are all moving to ->make_request, which has other issues. (*) others are struct request allocation and the pointless merge hash ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 18:16 ` Christoph Hellwig @ 2011-09-28 19:05 ` Eric Seppanen 2011-09-28 19:14 ` Christoph Hellwig 2011-11-30 10:18 ` Jens Axboe 2011-09-28 22:34 ` Vivek Goyal 1 sibling, 2 replies; 61+ messages in thread From: Eric Seppanen @ 2011-09-28 19:05 UTC (permalink / raw) To: Christoph Hellwig Cc: Vivek Goyal, James Bottomley, Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 11:16 AM, Christoph Hellwig <hch@infradead.org> wrote: > Right now on high iops device queue_lock is the major killer for > performance. It's one major reason (*) why a lot of the high iops devices > are all moving to ->make_request, which has other issues. > > (*) others are struct request allocation and the pointless merge hash I agree: queue lock is the worst performance killer when hw can do >100K IOPS per block device. Rather than just being chased away from the request queue due to performance issues, I could argue there's very little point to having a queue for devices that (a) have no seek penalty (and always use noop elevator) (b) have hardware queues at least as deep as the default request queue (c) don't benefit from merging (c) is maybe debatable, but if a device can saturate its bus bandwidth on 4KB IO, the latency is probably not worth it. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 19:05 ` Eric Seppanen @ 2011-09-28 19:14 ` Christoph Hellwig 2011-11-30 10:18 ` Jens Axboe 1 sibling, 0 replies; 61+ messages in thread From: Christoph Hellwig @ 2011-09-28 19:14 UTC (permalink / raw) To: Eric Seppanen Cc: Vivek Goyal, James Bottomley, Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 12:05:10PM -0700, Eric Seppanen wrote: > I agree: queue lock is the worst performance killer when hw can do > >100K IOPS per block device. > > Rather than just being chased away from the request queue due to > performance issues, I could argue there's very little point to having > a queue for devices that > (a) have no seek penalty (and always use noop elevator) > (b) have hardware queues at least as deep as the default request queue > (c) don't benefit from merging > > (c) is maybe debatable, but if a device can saturate its bus bandwidth > on 4KB IO, the latency is probably not worth it. In theory, yes. But at some point we will be able to saturate them, and then people want proportional I/O, light amounts of queueing, etc. And I really don't want to reinvent that for every little device. The other problem is that we a single driver might driver totally different types of devices, already today we have iSCSI or FC accessibly high IOPS devices, there are good PCI-e flash devices masquerading as AHCI, and my current problem is that the queue_lock really hurts me in virtio-blk when using a PCIe flash device underneath. So we really need some infrastructure that allows a generic interface to the driver, and allow us to plug in merging, scheduling, queueing on an as needed basis. That is my long term plan - making request_lock suck a little less, and improving the driver interface is a good first step, though. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 19:05 ` Eric Seppanen 2011-09-28 19:14 ` Christoph Hellwig @ 2011-11-30 10:18 ` Jens Axboe 2011-11-30 10:26 ` Christoph Hellwig 1 sibling, 1 reply; 61+ messages in thread From: Jens Axboe @ 2011-11-30 10:18 UTC (permalink / raw) To: Eric Seppanen Cc: Christoph Hellwig, Vivek Goyal, James Bottomley, Linus Torvalds, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On 2011-09-28 21:05, Eric Seppanen wrote: > On Wed, Sep 28, 2011 at 11:16 AM, Christoph Hellwig <hch@infradead.org> wrote: >> Right now on high iops device queue_lock is the major killer for >> performance. It's one major reason (*) why a lot of the high iops devices >> are all moving to ->make_request, which has other issues. >> >> (*) others are struct request allocation and the pointless merge hash > > I agree: queue lock is the worst performance killer when hw can do >> 100K IOPS per block device. > > Rather than just being chased away from the request queue due to > performance issues, I could argue there's very little point to having > a queue for devices that > (a) have no seek penalty (and always use noop elevator) > (b) have hardware queues at least as deep as the default request queue > (c) don't benefit from merging > > (c) is maybe debatable, but if a device can saturate its bus bandwidth > on 4KB IO, the latency is probably not worth it. I agree on a+b, but c is definitely more than debatable. I have yet to see a device saturate its bandwidth on 4KB IOS. So merging on the write side is always going to be a win. -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-30 10:18 ` Jens Axboe @ 2011-11-30 10:26 ` Christoph Hellwig 0 siblings, 0 replies; 61+ messages in thread From: Christoph Hellwig @ 2011-11-30 10:26 UTC (permalink / raw) To: Jens Axboe Cc: Eric Seppanen, Christoph Hellwig, Vivek Goyal, James Bottomley, Linus Torvalds, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Nov 30, 2011 at 11:18:42AM +0100, Jens Axboe wrote: > I agree on a+b, but c is definitely more than debatable. I have yet to > see a device saturate its bandwidth on 4KB IOS. So merging on the write > side is always going to be a win. We shouldn't submit 4k I/O in the first place unless they truely are random. If you look at XFS for example you'd basically never see them. The only small I/O in a typical workloads are individual btree blocks and AG header updates, which generally can't be merged anyway. Data I/O happens in larger chunks generally, as do reads/writes of inodes (generally in 32k chunks, with plans to go larger). Note that the merges in the way the are done currently are one of the most painful bits of the current request_fn based drivers. They require the addition of the struct request data structure, which needs to be allocated and initialized for every bio, no matter if it's beeing merged or not, and are the prime reason why make_request_fn and request_fn style drivers can't operate on the same data structures. I've been wondering if it would be possible to simply chain bios for merging, and while it's theoretically possible I'm wondering about the impact on drivers. Fortunately scsi drivers generally don't touch struct request directly, so it might be doable. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 18:16 ` Christoph Hellwig 2011-09-28 19:05 ` Eric Seppanen @ 2011-09-28 22:34 ` Vivek Goyal 1 sibling, 0 replies; 61+ messages in thread From: Vivek Goyal @ 2011-09-28 22:34 UTC (permalink / raw) To: Christoph Hellwig Cc: James Bottomley, Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, Sep 28, 2011 at 02:16:49PM -0400, Christoph Hellwig wrote: > On Wed, Sep 28, 2011 at 02:09:05PM -0400, Vivek Goyal wrote: > > I had thought of implementing a separate lock for throttling. Then I > > noticed few operations like checking for queue flags where I would > > be required to hold queue locks. > > Can you do a writeup of these issues? E.g. if the flags are throtteling > specific we can move them to a separate flags field, if not we can > see if we can make them atomic or similar. Sure. I am going through the code and trying to list down some of the dependencies. But before that I would say that tyring to optimize the throttling code to reduce the queue lock contention should be last on the priority list. The simple reason being that in common case it does not incur any locking cost. One needs to have IO cgroup configured and needs to have some throttling rules put in place only then queue lock cost will come into picture. And for most of the people, I don't think that's the common case. Now coming back to question of dependencies. We take some services from request queue which needs to happen under queue lock. - Looks like in current code I am only accessing QUEUE_FLAG_DEAD. I see that flag is not synchronized using queue lock. So it probably is not a concern. - I am using tracing functionality. blk_add_trace_msg(). I guess it is not required to take queue lock for this. But I am not sure. - I am accessing q->backing_dev_info to get to associated devices's major and minor number. That's it I guess. That's what a quick look tells me. So looks like it is not too hard to convert existing blk throttle code to use its own lock. That will also get rid of dependency on queue lock and in turn uncertainity associated with queue lock being valid (if driver provided the lock). Thanks Vivek ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 17:48 ` Vivek Goyal 2011-09-28 17:53 ` Christoph Hellwig @ 2011-09-28 17:59 ` James Bottomley 1 sibling, 0 replies; 61+ messages in thread From: James Bottomley @ 2011-09-28 17:59 UTC (permalink / raw) To: Vivek Goyal Cc: Linus Torvalds, Jens Axboe, Hannes Reinecke, James Bottomley, linux-scsi@vger.kernel.org, Linux Kernel On Wed, 2011-09-28 at 13:48 -0400, Vivek Goyal wrote: > On Wed, Sep 28, 2011 at 10:43:36AM -0500, James Bottomley wrote: > > On Wed, 2011-09-28 at 08:22 -0700, Linus Torvalds wrote: > > > On Wed, Sep 28, 2011 at 7:14 AM, Jens Axboe <axboe@kernel.dk> wrote: > > > > > > > > /* > > > > - * Note: If a driver supplied the queue lock, it should not zap that lock > > > > - * unexpectedly as some queue cleanup components like elevator_exit() and > > > > - * blk_throtl_exit() need queue lock. > > > > + * Note: If a driver supplied the queue lock, it is disconnected > > > > + * by this function. The actual state of the lock doesn't matter > > > > + * here as the request_queue isn't accessible after this point > > > > + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. > > > > */ > > > > > > So quite frankly, I just don't believe in that comment. > > > > > > If no more requests will be queued or completed, then the queue lock > > > is irrelevant and should not be changed. > > > > That was my original argument for my patch. I lost it because you can > > still hold a queue reference in the sysfs code for block, which means > > that the put in blk_cleanup_queue() won't be the final one and you'll > > get a use after free of the lock when the sysfs directory is exited > > because we take the lock again as we destroy the elevator. > > > > > More importantly, if no more requests are queued or completed after > > > blk_cleanup_queue(), then we wouldn't have had the bug that we clearly > > > had with the elevator accesses, now would we? So the comment seems to > > > be obviously bogus and wrong. > > > > So this I agree with. blk_cleanup_queue() prevents any new access to > > the queue, but we still have the old reference holders to contend with. > > They can submit requests, although we try to error them again with the > > queue guards check. > > > > > I pulled this, but I think the "just move the teardown" would have > > > been the safer option. What happens if a request completes on another > > > CPU just as we are changing locks, and we lock one lock and then > > > unlock another?! > > > > The only code for which this could be true is code where we use the > > block supplied lock, so effectively it never changes. > > > The drivers which supply their own lock are supposed to have already > > ensured that the queue is unused. > > Hi James, > > For my education purposes, how will driver come to know that queue is > unused? Does it happen by checking if any requsts are queued or not? If > yes, we might run into issues with throttling logic. I can't explain this ... it's the bit I think is bogus. If we need refcounted queues, it's impossible to satisfy and if we don't, why do we have refcounts? The root cause of this is allowing drivers to specify locks. If we really want to continue doing this, we should have a lock release function which is called as part of the queue refcounting model where the final lock free can be done. That's a big amount of work if you look at all the block drivers which use their own locks. James > For example, if some bio have been throttled and are queued in some data > structures on queue. In that case driver does not even know that some bios > are queued and will be submitted later. Now if driver calls blk_cleanup_queue() > it might happen that throttling related worker is already queue lock and > trying to do some housekeeping or trying to dispatch bio etc. Now if queue > lock is swapped, it will just cause all the kind of issues. > > I am wondering if we should retain blk_throtl_exit() in blk_cleanup_queue() > before lock swap and just move elevator cleanup in blk_release_queue(). > > A note to myself, I should probably enhance blk_throtl_exit() to look for any > queued throttled bio and single their completion with error (-ENODEV) or > something like that. > > Thanks > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-09-28 14:14 ` [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) Jens Axboe 2011-09-28 15:22 ` Linus Torvalds @ 2011-10-13 13:09 ` Steffen Maier 2011-10-14 16:03 ` James Bottomley 1 sibling, 1 reply; 61+ messages in thread From: Steffen Maier @ 2011-10-13 13:09 UTC (permalink / raw) To: linux-scsi@vger.kernel.org Cc: Jens Axboe, James Bottomley, Hannes Reinecke, Linux Kernel, Alan Stern, Jun'ichi Nomura, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Heiko Carstens This fix also went into 3.0.5 via http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=releases/3.0.5/block-free-queue-resources-at-blk_release_queue.patch (originated at http://marc.info/?l=linux-scsi&m=131669751909474&w=2 and http://marc.info/?l=linux-scsi&m=131669414205696&w=2) However, it seems we still have a use-after-free bug, now causing the following oops with kernel 3.0.6 when removing paths to storage while generating load on device-mapper multipath disks: > Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > Oops: 0038 [#1] PREEMPT SMP DEBUG_PAGEALLOC > Modules linked in: iptable_filter ip_tables x_tables dm_round_robin sunrpc qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod ipv6 qeth ccwgroup [last unloaded: scsi_wait_scan] > CPU: 1 Not tainted 3.0.6-50.x.20111006-s390xdefault #1 > Process blast.LzS_64_SL (pid: 26613, task: 0000000063e2a398, ksp: 0000000064de3560) > Krnl PSW : 0704100180000000 000000000048a038 (scsi_print_command+0x44/0xf8) > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 > Krnl GPRS: 000000000000006b 6b6b6b6b6b6b6b6b 000000006717f800 000000000094f2e0 > 000000000061242e 000000000062bd88 0000000066fb90d8 0000000065391ad7 > 000000006717f800 000000006717f800 000000006716a090 000000006717f800 > 0000000000000004 0000000000672f88 0000000064de3838 0000000064de3808 > Krnl Code: 000000000048a026: f0b80004ebbf srp 4(12,%r0),3007(%r14),8 > 000000000048a02c: f0a0000407f4 srp 4(11,%r0),2036,0 > 000000000048a032: e31020800004 lg %r1,128(%r2) > >000000000048a038: e31010b00004 lg %r1,176(%r1) > 000000000048a03e: b9020011 ltgr %r1,%r1 > 000000000048a042: a7840032 brc 8,48a0a6 > 000000000048a046: e33020000004 lg %r3,0(%r2) > 000000000048a04c: c04000182f4c larl %r4,78fee4 > Call Trace: > ([<000000006717f800>] 0x6717f800) > [<0000000000487f28>] scsi_log_send+0xf0/0x130 > [<000000000048824c>] scsi_dispatch_cmd+0xc8/0x4bc > [<0000000000490694>] scsi_request_fn+0x3b8/0x480 > [<00000000003fe4c8>] __elv_add_request+0x174/0x35c > [<0000000000402fd6>] blk_insert_cloned_request+0x72/0xcc > [<000003e001e134f2>] dm_dispatch_request+0x5a/0x94 [dm_mod] > [<000003e001e15f36>] dm_request_fn+0x216/0x398 [dm_mod] > [<0000000000405590>] queue_unplugged+0x60/0x1a8 > [<0000000000407470>] blk_flush_plug_list+0x218/0x29c > [<00000000006136ec>] io_schedule+0x94/0x100 > [<000000000020e6d6>] sleep_on_page_killable+0x22/0x60 > [<00000000006140a8>] __wait_on_bit_lock+0xc0/0x124 > [<000000000020e5f8>] __lock_page_killable+0x78/0x84 > [<0000000000210690>] generic_file_aio_read+0x56c/0x800 > [<000000000027a0a0>] do_sync_read+0xcc/0x11c > [<000000000027a974>] vfs_read+0xb0/0x194 > [<000000000027aab0>] SyS_read+0x58/0xb4 > [<00000000006185f6>] sysc_noemu+0x16/0x1c > [<000003fffd473bd4>] 0x3fffd473bd4 > INFO: lockdep is turned off. > Last Breaking-Event-Address: > [<000000000048a020>] scsi_print_command+0x2c/0xf8 During a different run, we also got a similar oops which was not logged but only its panic_on_oops: > Kernel panic - not syncing: Fatal exception: panic_on_oops > CPU: 1 Tainted: G D 3.0.6-50.x.20111006-s390xdefault #1 > Process blast.LzS_64_SL (pid: 6551, task: 00000000a2942398, ksp: 00000000a11834e0) > 00000000a1183630 00000000a11835b0 0000000000000002 0000000000000000 > 00000000a1183650 00000000a11835c8 00000000a11835c8 000000000061242e > 0000000000000000 0000000000000000 00000000a3176ed7 0000000000100ed2 > 000000000000000d 000000000000000c 00000000a1183618 0000000000000000 > 0000000000000000 0000000000100a18 00000000a11835b0 00000000a11835f0 > Call Trace: > ([<0000000000100926>] show_trace+0xee/0x144) > [<0000000000612258>] panic+0xb0/0x234 > [<0000000000100ed2>] die+0x166/0x168 > [<000000000011eede>] do_no_context+0xba/0xf8 > [<000000000061ad28>] do_asce_exception+0x130/0x18c > [<0000000000618918>] pgm_exit+0x0/0x4 > [<000000000048a0d4>] scsi_print_command+0xe0/0xf8 > ([<000000000048a0e6>] scsi_print_command+0xf2/0xf8) > [<0000000000487f28>] scsi_log_send+0xf0/0x130 > [<000000000048824c>] scsi_dispatch_cmd+0xc8/0x4bc > [<0000000000490694>] scsi_request_fn+0x3b8/0x480 > [<00000000003fe4c8>] __elv_add_request+0x174/0x35c > [<0000000000402fd6>] blk_insert_cloned_request+0x72/0xcc > [<000003e001e134f2>] dm_dispatch_request+0x5a/0x94 [dm_mod] > [<000003e001e15f36>] dm_request_fn+0x216/0x398 [dm_mod] > [<0000000000405590>] queue_unplugged+0x60/0x1a8 > [<0000000000407470>] blk_flush_plug_list+0x218/0x29c > [<000000000040751e>] blk_finish_plug+0x2a/0x58 > [<000000000021034e>] generic_file_aio_read+0x22a/0x800 > [<000000000027a0a0>] do_sync_read+0xcc/0x11c > [<000000000027a974>] vfs_read+0xb0/0x194 > [<000000000027aab0>] SyS_read+0x58/0xb4 > [<00000000006185f6>] sysc_noemu+0x16/0x1c > [<000003fffd395bd4>] 0x3fffd395bd4 > INFO: lockdep is turned off. The other CPU was doing scsi_print_command successfully to the console (sclp_console_write) when this panic occurred. Earlier, we tried Alan's patch http://marc.info/?l=linux-scsi&m=130963676807728&w=2 on 3.0.4, but in contrast to Cascardo's experience http://marc.info/?l=linux-scsi&m=131482024119345&w=2 it did not work for us. Initially, we encountered use-after-free bugs in scsi_print_command / scsi_dispatch_cmd http://marc.info/?l=linux-scsi&m=130824013229933&w=2 __elv_add_request http://marc.info/?l=linux-scsi&m=130858383702949&w=2 Also related is the following thread: http://marc.info/?l=linux-scsi&m=130977887419062&w=2 Unfortunately, I don't understand what exactly is going wrong here. Can you shed some light on this? On 09/28/2011 04:14 PM, Jens Axboe wrote: > Alright, lets call this fully tested and fixed then. Linus, I committed > it, please pull: > > git://git.kernel.dk/linux-block.git for-linus > > Hannes Reinecke (1): > block: Free queue resources at blk_release_queue() > > block/blk-core.c | 13 ++++++------- > block/blk-sysfs.c | 5 +++++ > 2 files changed, 11 insertions(+), 7 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index b2ed78a..d34433a 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -348,9 +348,10 @@ void blk_put_queue(struct request_queue *q) > EXPORT_SYMBOL(blk_put_queue); > > /* > - * Note: If a driver supplied the queue lock, it should not zap that lock > - * unexpectedly as some queue cleanup components like elevator_exit() and > - * blk_throtl_exit() need queue lock. > + * Note: If a driver supplied the queue lock, it is disconnected > + * by this function. The actual state of the lock doesn't matter > + * here as the request_queue isn't accessible after this point > + * (QUEUE_FLAG_DEAD is set) and no other requests will be queued. > */ > void blk_cleanup_queue(struct request_queue *q) > { > @@ -367,10 +368,8 @@ void blk_cleanup_queue(struct request_queue *q) > queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); > mutex_unlock(&q->sysfs_lock); > > - if (q->elevator) > - elevator_exit(q->elevator); > - > - blk_throtl_exit(q); > + if (q->queue_lock !=&q->__queue_lock) > + q->queue_lock =&q->__queue_lock; > > blk_put_queue(q); > } > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index e681805..60fda88 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -479,6 +479,11 @@ static void blk_release_queue(struct kobject *kobj) > > blk_sync_queue(q); > > + if (q->elevator) > + elevator_exit(q->elevator); > + > + blk_throtl_exit(q); > + > if (rl->rq_pool) > mempool_destroy(rl->rq_pool); > > > -- Steffen Maier Linux on System z Development IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-13 13:09 ` Steffen Maier @ 2011-10-14 16:03 ` James Bottomley 2011-10-17 8:46 ` Jun'ichi Nomura 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2011-10-14 16:03 UTC (permalink / raw) To: Steffen Maier Cc: linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Jun'ichi Nomura, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Heiko Carstens On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: > This fix also went into 3.0.5 via > http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=releases/3.0.5/block-free-queue-resources-at-blk_release_queue.patch > (originated at http://marc.info/?l=linux-scsi&m=131669751909474&w=2 and > http://marc.info/?l=linux-scsi&m=131669414205696&w=2) > > However, it seems we still have a use-after-free bug, now causing the > following oops with kernel 3.0.6 when removing paths to storage while > generating load on device-mapper multipath disks: > > > Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > > Oops: 0038 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > Modules linked in: iptable_filter ip_tables x_tables dm_round_robin sunrpc qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod ipv6 qeth ccwgroup [last unloaded: scsi_wait_scan] > > CPU: 1 Not tainted 3.0.6-50.x.20111006-s390xdefault #1 > > Process blast.LzS_64_SL (pid: 26613, task: 0000000063e2a398, ksp: 0000000064de3560) > > Krnl PSW : 0704100180000000 000000000048a038 (scsi_print_command+0x44/0xf8) > > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 > > Krnl GPRS: 000000000000006b 6b6b6b6b6b6b6b6b 000000006717f800 000000000094f2e0 > > 000000000061242e 000000000062bd88 0000000066fb90d8 0000000065391ad7 > > 000000006717f800 000000006717f800 000000006716a090 000000006717f800 > > 0000000000000004 0000000000672f88 0000000064de3838 0000000064de3808 > > Krnl Code: 000000000048a026: f0b80004ebbf srp 4(12,%r0),3007(%r14),8 > > 000000000048a02c: f0a0000407f4 srp 4(11,%r0),2036,0 > > 000000000048a032: e31020800004 lg %r1,128(%r2) > > >000000000048a038: e31010b00004 lg %r1,176(%r1) > > 000000000048a03e: b9020011 ltgr %r1,%r1 > > 000000000048a042: a7840032 brc 8,48a0a6 > > 000000000048a046: e33020000004 lg %r3,0(%r2) > > 000000000048a04c: c04000182f4c larl %r4,78fee4 > > Call Trace: > > ([<000000006717f800>] 0x6717f800) > > [<0000000000487f28>] scsi_log_send+0xf0/0x130 > > [<000000000048824c>] scsi_dispatch_cmd+0xc8/0x4bc > > [<0000000000490694>] scsi_request_fn+0x3b8/0x480 Correct me if I'm wrong, but this seems to be saying that struct scsi_cmnd was used after free. This looks to be a different problem because the command has a separate refcounting model which wasn't impacted by the change ... it could be we've just exposed yet another refcounting problem outside of the queue one. If I had to guess, I'd say a bio got cloned with a SCSI command already attached, but the ref count on the SCSI command wasn't correctly adjusted. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-14 16:03 ` James Bottomley @ 2011-10-17 8:46 ` Jun'ichi Nomura 2011-10-17 14:06 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Jun'ichi Nomura @ 2011-10-17 8:46 UTC (permalink / raw) To: James Bottomley Cc: Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Heiko Carstens On 10/15/11 01:03, James Bottomley wrote: > On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: >> This fix also went into 3.0.5 via >> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=releases/3.0.5/block-free-queue-resources-at-blk_release_queue.patch >> (originated at http://marc.info/?l=linux-scsi&m=131669751909474&w=2 and >> http://marc.info/?l=linux-scsi&m=131669414205696&w=2) >> >> However, it seems we still have a use-after-free bug, now causing the >> following oops with kernel 3.0.6 when removing paths to storage while >> generating load on device-mapper multipath disks: >> >>> Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 >>> Oops: 0038 [#1] PREEMPT SMP DEBUG_PAGEALLOC >>> Modules linked in: iptable_filter ip_tables x_tables dm_round_robin sunrpc qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod ipv6 qeth ccwgroup [last unloaded: scsi_wait_scan] >>> CPU: 1 Not tainted 3.0.6-50.x.20111006-s390xdefault #1 >>> Process blast.LzS_64_SL (pid: 26613, task: 0000000063e2a398, ksp: 0000000064de3560) >>> Krnl PSW : 0704100180000000 000000000048a038 (scsi_print_command+0x44/0xf8) >>> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 >>> Krnl GPRS: 000000000000006b 6b6b6b6b6b6b6b6b 000000006717f800 000000000094f2e0 >>> 000000000061242e 000000000062bd88 0000000066fb90d8 0000000065391ad7 >>> 000000006717f800 000000006717f800 000000006716a090 000000006717f800 >>> 0000000000000004 0000000000672f88 0000000064de3838 0000000064de3808 >>> Krnl Code: 000000000048a026: f0b80004ebbf srp 4(12,%r0),3007(%r14),8 >>> 000000000048a02c: f0a0000407f4 srp 4(11,%r0),2036,0 >>> 000000000048a032: e31020800004 lg %r1,128(%r2) >>> >000000000048a038: e31010b00004 lg %r1,176(%r1) >>> 000000000048a03e: b9020011 ltgr %r1,%r1 >>> 000000000048a042: a7840032 brc 8,48a0a6 >>> 000000000048a046: e33020000004 lg %r3,0(%r2) >>> 000000000048a04c: c04000182f4c larl %r4,78fee4 >>> Call Trace: >>> ([<000000006717f800>] 0x6717f800) >>> [<0000000000487f28>] scsi_log_send+0xf0/0x130 >>> [<000000000048824c>] scsi_dispatch_cmd+0xc8/0x4bc >>> [<0000000000490694>] scsi_request_fn+0x3b8/0x480 > > Correct me if I'm wrong, but this seems to be saying that struct > scsi_cmnd was used after free. This looks to be a different problem > because the command has a separate refcounting model which wasn't > impacted by the change ... it could be we've just exposed yet another > refcounting problem outside of the queue one. > > If I had to guess, I'd say a bio got cloned with a SCSI command already > attached, but the ref count on the SCSI command wasn't correctly > adjusted. As far as dm is concerned, it shouldn't happen. Clone is made from a dm request, not from SCSI one. Also clone is not reused when retrying. >> Initially, we encountered use-after-free bugs in >> scsi_print_command / scsi_dispatch_cmd >> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 It is interesting that both this and the older report got oopsed in scsi_log_send(), while there are other dereferences of 'cmd' around scsi_dispatch_cmd(). Are there any reason they are special? Just by accident? Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-17 8:46 ` Jun'ichi Nomura @ 2011-10-17 14:06 ` James Bottomley 2011-10-18 13:31 ` Jun'ichi Nomura 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2011-10-17 14:06 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Heiko Carstens On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: > On 10/15/11 01:03, James Bottomley wrote: > > On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: > >> This fix also went into 3.0.5 via > >> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=releases/3.0.5/block-free-queue-resources-at-blk_release_queue.patch > >> (originated at http://marc.info/?l=linux-scsi&m=131669751909474&w=2 and > >> http://marc.info/?l=linux-scsi&m=131669414205696&w=2) > >> > >> However, it seems we still have a use-after-free bug, now causing the > >> following oops with kernel 3.0.6 when removing paths to storage while > >> generating load on device-mapper multipath disks: > >> > >>> Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > >>> Oops: 0038 [#1] PREEMPT SMP DEBUG_PAGEALLOC > >>> Modules linked in: iptable_filter ip_tables x_tables dm_round_robin sunrpc qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod ipv6 qeth ccwgroup [last unloaded: scsi_wait_scan] > >>> CPU: 1 Not tainted 3.0.6-50.x.20111006-s390xdefault #1 > >>> Process blast.LzS_64_SL (pid: 26613, task: 0000000063e2a398, ksp: 0000000064de3560) > >>> Krnl PSW : 0704100180000000 000000000048a038 (scsi_print_command+0x44/0xf8) > >>> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 > >>> Krnl GPRS: 000000000000006b 6b6b6b6b6b6b6b6b 000000006717f800 000000000094f2e0 > >>> 000000000061242e 000000000062bd88 0000000066fb90d8 0000000065391ad7 > >>> 000000006717f800 000000006717f800 000000006716a090 000000006717f800 > >>> 0000000000000004 0000000000672f88 0000000064de3838 0000000064de3808 > >>> Krnl Code: 000000000048a026: f0b80004ebbf srp 4(12,%r0),3007(%r14),8 > >>> 000000000048a02c: f0a0000407f4 srp 4(11,%r0),2036,0 > >>> 000000000048a032: e31020800004 lg %r1,128(%r2) > >>> >000000000048a038: e31010b00004 lg %r1,176(%r1) > >>> 000000000048a03e: b9020011 ltgr %r1,%r1 > >>> 000000000048a042: a7840032 brc 8,48a0a6 > >>> 000000000048a046: e33020000004 lg %r3,0(%r2) > >>> 000000000048a04c: c04000182f4c larl %r4,78fee4 > >>> Call Trace: > >>> ([<000000006717f800>] 0x6717f800) > >>> [<0000000000487f28>] scsi_log_send+0xf0/0x130 > >>> [<000000000048824c>] scsi_dispatch_cmd+0xc8/0x4bc > >>> [<0000000000490694>] scsi_request_fn+0x3b8/0x480 > > > > Correct me if I'm wrong, but this seems to be saying that struct > > scsi_cmnd was used after free. This looks to be a different problem > > because the command has a separate refcounting model which wasn't > > impacted by the change ... it could be we've just exposed yet another > > refcounting problem outside of the queue one. > > > > If I had to guess, I'd say a bio got cloned with a SCSI command already > > attached, but the ref count on the SCSI command wasn't correctly > > adjusted. > > As far as dm is concerned, it shouldn't happen. > Clone is made from a dm request, not from SCSI one. > Also clone is not reused when retrying. It was just a guess. Assuming the command got freed prematurely, there has to be something in the dm path to explain why the SCSI refcounting model got screwed up. Cloning a bio with an attached command was what first occurred to me, but perhaps there are other ways I'm not seeing. > >> Initially, we encountered use-after-free bugs in > >> scsi_print_command / scsi_dispatch_cmd > >> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 > > It is interesting that both this and the older report > got oopsed in scsi_log_send(), while there are other > dereferences of 'cmd' around scsi_dispatch_cmd(). > Are there any reason they are special? Just by accident? Right, that's why it looks like the command area got freed rather than the command pointer was bogus (6b is a poison free pattern). Perhaps if the reporter could pin down the failing source line, we'd know better what was going on? James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-17 14:06 ` James Bottomley @ 2011-10-18 13:31 ` Jun'ichi Nomura 2011-10-18 15:45 ` Heiko Carstens 0 siblings, 1 reply; 61+ messages in thread From: Jun'ichi Nomura @ 2011-10-18 13:31 UTC (permalink / raw) To: James Bottomley, Steffen Maier Cc: linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Heiko Carstens On 10/17/11 23:06, James Bottomley wrote: > On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: >> On 10/15/11 01:03, James Bottomley wrote: >>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: >>>> This fix also went into 3.0.5 via >>>> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=releases/3.0.5/block-free-queue-resources-at-blk_release_queue.patch >>>> (originated at http://marc.info/?l=linux-scsi&m=131669751909474&w=2 and >>>> http://marc.info/?l=linux-scsi&m=131669414205696&w=2) >>>> >>>> However, it seems we still have a use-after-free bug, now causing the >>>> following oops with kernel 3.0.6 when removing paths to storage while >>>> generating load on device-mapper multipath disks: >>>> >>>>> Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 >>>>> Oops: 0038 [#1] PREEMPT SMP DEBUG_PAGEALLOC >>>>> Modules linked in: iptable_filter ip_tables x_tables dm_round_robin sunrpc qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod ipv6 qeth ccwgroup [last unloaded: scsi_wait_scan] >>>>> CPU: 1 Not tainted 3.0.6-50.x.20111006-s390xdefault #1 >>>>> Process blast.LzS_64_SL (pid: 26613, task: 0000000063e2a398, ksp: 0000000064de3560) >>>>> Krnl PSW : 0704100180000000 000000000048a038 (scsi_print_command+0x44/0xf8) >>>>> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 >>>>> Krnl GPRS: 000000000000006b 6b6b6b6b6b6b6b6b 000000006717f800 000000000094f2e0 >>>>> 000000000061242e 000000000062bd88 0000000066fb90d8 0000000065391ad7 >>>>> 000000006717f800 000000006717f800 000000006716a090 000000006717f800 >>>>> 0000000000000004 0000000000672f88 0000000064de3838 0000000064de3808 >>>>> Krnl Code: 000000000048a026: f0b80004ebbf srp 4(12,%r0),3007(%r14),8 >>>>> 000000000048a02c: f0a0000407f4 srp 4(11,%r0),2036,0 >>>>> 000000000048a032: e31020800004 lg %r1,128(%r2) >>>>> >000000000048a038: e31010b00004 lg %r1,176(%r1) >>>>> 000000000048a03e: b9020011 ltgr %r1,%r1 >>>>> 000000000048a042: a7840032 brc 8,48a0a6 >>>>> 000000000048a046: e33020000004 lg %r3,0(%r2) >>>>> 000000000048a04c: c04000182f4c larl %r4,78fee4 >>>>> Call Trace: >>>>> ([<000000006717f800>] 0x6717f800) >>>>> [<0000000000487f28>] scsi_log_send+0xf0/0x130 >>>>> [<000000000048824c>] scsi_dispatch_cmd+0xc8/0x4bc >>>>> [<0000000000490694>] scsi_request_fn+0x3b8/0x480 >>> >>> Correct me if I'm wrong, but this seems to be saying that struct >>> scsi_cmnd was used after free. This looks to be a different problem >>> because the command has a separate refcounting model which wasn't >>> impacted by the change ... it could be we've just exposed yet another >>> refcounting problem outside of the queue one. >>> >>> If I had to guess, I'd say a bio got cloned with a SCSI command already >>> attached, but the ref count on the SCSI command wasn't correctly >>> adjusted. >> >> As far as dm is concerned, it shouldn't happen. >> Clone is made from a dm request, not from SCSI one. >> Also clone is not reused when retrying. > > It was just a guess. Assuming the command got freed prematurely, there > has to be something in the dm path to explain why the SCSI refcounting > model got screwed up. Cloning a bio with an attached command was what > first occurred to me, but perhaps there are other ways I'm not seeing. > >>>> Initially, we encountered use-after-free bugs in >>>> scsi_print_command / scsi_dispatch_cmd >>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 >> >> It is interesting that both this and the older report >> got oopsed in scsi_log_send(), while there are other >> dereferences of 'cmd' around scsi_dispatch_cmd(). >> Are there any reason they are special? Just by accident? > > Right, that's why it looks like the command area got freed rather than > the command pointer was bogus (6b is a poison free pattern). Perhaps if > the reporter could pin down the failing source line, we'd know better > what was going on? Yeah, that might be useful. One remote possibility I imagined is if the submitting process took very long after blk_start_request until scsi_dispatch_cmd and timeout handler kicks in, can cmd be freed? Also Tejun's report here could be related to possible data corruption: [PATCH] block: make gendisk hold a reference to its queue https://lkml.org/lkml/2011/10/16/148 Though I haven't hit the reported oops myself, a process closing a removed device may modify freed memory. And his patch will fix it. So if the problem is easily reproducible, I think it's worth trying his patch to see if the problem disappear. Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-18 13:31 ` Jun'ichi Nomura @ 2011-10-18 15:45 ` Heiko Carstens 2011-10-18 16:29 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Heiko Carstens @ 2011-10-18 15:45 UTC (permalink / raw) To: Jun'ichi Nomura Cc: James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar On Tue, Oct 18, 2011 at 10:31:20PM +0900, Jun'ichi Nomura wrote: > On 10/17/11 23:06, James Bottomley wrote: > > On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: > >> On 10/15/11 01:03, James Bottomley wrote: > >>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: > >>>> Initially, we encountered use-after-free bugs in > >>>> scsi_print_command / scsi_dispatch_cmd > >>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 > >> > >> It is interesting that both this and the older report > >> got oopsed in scsi_log_send(), while there are other > >> dereferences of 'cmd' around scsi_dispatch_cmd(). > >> Are there any reason they are special? Just by accident? > > > > Right, that's why it looks like the command area got freed rather than > > the command pointer was bogus (6b is a poison free pattern). Perhaps if > > the reporter could pin down the failing source line, we'd know better > > what was going on? > > Yeah, that might be useful. The struct scsi_cmnd that was passed to scsi_log_send() was already freed (contents completely 6b6b6b...). Since SLUB debugging was turned on we can see that it was freed from __scsi_put_command(). Not too much of a surprise. > One remote possibility I imagined is if the submitting process > took very long after blk_start_request until scsi_dispatch_cmd > and timeout handler kicks in, can cmd be freed? > > Also Tejun's report here could be related to possible data corruption: > [PATCH] block: make gendisk hold a reference to its queue > https://lkml.org/lkml/2011/10/16/148 > > Though I haven't hit the reported oops myself, > a process closing a removed device may modify freed memory. > And his patch will fix it. > > So if the problem is easily reproducible, I think it's worth > trying his patch to see if the problem disappear. In addition it might be worthwhile to apply this small slub patch as well, so that we get a full trace of the caller in the object which was freed: --- mm/slub.c | 11 +++++++++++ 1 file changed, 11 insertions(+) --- a/mm/slub.c +++ b/mm/slub.c @@ -27,6 +27,7 @@ #include <linux/memory.h> #include <linux/math64.h> #include <linux/fault-inject.h> +#include <linux/stacktrace.h> #include <trace/events/kmem.h> @@ -191,11 +192,14 @@ static LIST_HEAD(slab_caches); /* * Tracking user of a slab. */ +#define SAVE_TRACE_NUM 30 struct track { unsigned long addr; /* Called from address */ int cpu; /* Was running on cpu */ int pid; /* Pid context */ unsigned long when; /* When did the operation occur */ + unsigned long trace[SAVE_TRACE_NUM]; + unsigned long long stamp; }; enum track_item { TRACK_ALLOC, TRACK_FREE }; @@ -397,10 +401,17 @@ static void set_track(struct kmem_cache struct track *p = get_track(s, object, alloc); if (addr) { + struct stack_trace trace = { + .max_entries = SAVE_TRACE_NUM, + .entries = p->trace, + }; + p->addr = addr; p->cpu = smp_processor_id(); p->pid = current->pid; p->when = jiffies; + p->stamp = get_clock(); + save_stack_trace(&trace); } else memset(p, 0, sizeof(struct track)); } ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-18 15:45 ` Heiko Carstens @ 2011-10-18 16:29 ` James Bottomley 2011-10-31 10:05 ` Heiko Carstens 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2011-10-18 16:29 UTC (permalink / raw) To: Heiko Carstens Cc: Jun'ichi Nomura, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar On Tue, 2011-10-18 at 17:45 +0200, Heiko Carstens wrote: > On Tue, Oct 18, 2011 at 10:31:20PM +0900, Jun'ichi Nomura wrote: > > On 10/17/11 23:06, James Bottomley wrote: > > > On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: > > >> On 10/15/11 01:03, James Bottomley wrote: > > >>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: > > >>>> Initially, we encountered use-after-free bugs in > > >>>> scsi_print_command / scsi_dispatch_cmd > > >>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 > > >> > > >> It is interesting that both this and the older report > > >> got oopsed in scsi_log_send(), while there are other > > >> dereferences of 'cmd' around scsi_dispatch_cmd(). > > >> Are there any reason they are special? Just by accident? > > > > > > Right, that's why it looks like the command area got freed rather than > > > the command pointer was bogus (6b is a poison free pattern). Perhaps if > > > the reporter could pin down the failing source line, we'd know better > > > what was going on? > > > > Yeah, that might be useful. > > The struct scsi_cmnd that was passed to scsi_log_send() was already freed > (contents completely 6b6b6b...). > Since SLUB debugging was turned on we can see that it was freed from > __scsi_put_command(). Not too much of a surprise. But it does tell us the put must be racing with dispatch, since dereferencing the command to find the device worked higher up in scsi_dispatch_cmd(). There is one way to invalidate the theory that we cloned something with an attached command, and that's to put BUG_ON(rq->special) in blk_insert_cloned_request(). I think we're careful about clearing it, so it should work (perhaps a warn on just in case). James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-18 16:29 ` James Bottomley @ 2011-10-31 10:05 ` Heiko Carstens 2011-10-31 10:42 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Heiko Carstens @ 2011-10-31 10:05 UTC (permalink / raw) To: James Bottomley Cc: Jun'ichi Nomura, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On Tue, Oct 18, 2011 at 11:29:40AM -0500, James Bottomley wrote: > On Tue, 2011-10-18 at 17:45 +0200, Heiko Carstens wrote: > > On Tue, Oct 18, 2011 at 10:31:20PM +0900, Jun'ichi Nomura wrote: > > > On 10/17/11 23:06, James Bottomley wrote: > > > > On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: > > > >> On 10/15/11 01:03, James Bottomley wrote: > > > >>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: > > > >>>> Initially, we encountered use-after-free bugs in > > > >>>> scsi_print_command / scsi_dispatch_cmd > > > >>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 > > > >> > > > >> It is interesting that both this and the older report > > > >> got oopsed in scsi_log_send(), while there are other > > > >> dereferences of 'cmd' around scsi_dispatch_cmd(). > > > >> Are there any reason they are special? Just by accident? > > > > > > > > Right, that's why it looks like the command area got freed rather than > > > > the command pointer was bogus (6b is a poison free pattern). Perhaps if > > > > the reporter could pin down the failing source line, we'd know better > > > > what was going on? > > > > > > Yeah, that might be useful. > > > > The struct scsi_cmnd that was passed to scsi_log_send() was already freed > > (contents completely 6b6b6b...). > > Since SLUB debugging was turned on we can see that it was freed from > > __scsi_put_command(). Not too much of a surprise. > > But it does tell us the put must be racing with dispatch, since > dereferencing the command to find the device worked higher up in > scsi_dispatch_cmd(). > > There is one way to invalidate the theory that we cloned something with > an attached command, and that's to put > > BUG_ON(rq->special) > > in blk_insert_cloned_request(). I think we're careful about clearing > it, so it should work (perhaps a warn on just in case). It _looks_ like we do not hit the BUG_ON() that. This time we get this instead: [ 4024.937870] Unable to handle kernel pointer dereference at virtual kernel address 000003e004d41000 [ 4024.937886] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 4024.937899] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [las t unloaded: scsi_wait_scan] [ 4024.937925] CPU: 1 Not tainted 3.0.7-50.x.20111021-s390xdefault #1 [ 4024.937930] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) [ 4024.937936] Krnl PSW : 0704000180000000 000003e00126263a (dm_softirq_done+0x72/0x140 [dm_mod]) [ 4024.937959] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 [ 4024.937966] Krnl GPRS: 000000007b9156b0 000003e004d41100 000000000e14b600 000000000000006d [ 4024.937971] 00000000715332b0 000000000c140ce8 000000000090d2ef 0000000000000005 [ 4024.937977] 0000000000000001 0000000000000101 000000000c140d00 0000000000000000 [ 4024.937983] 000003e001260000 000003e00126f098 0000000073adfd08 0000000073adfcb8 [ 4024.938001] Krnl Code: 000003e00126262a: f0a0000407f1 srp 4(11,%r0),2033,0 [ 4024.938009] 000003e001262630: e31050080004 lg %r1,8(%r5) [ 4024.938017] 000003e001262636: 58b05180 l %r11,384(%r5) [ 4024.938024] >000003e00126263a: e31010080004 lg %r1,8(%r1) [ 4024.938031] 000003e001262640: e31010500004 lg %r1,80(%r1) [ 4024.938038] 000003e001262646: b9020011 ltgr %r1,%r1 [ 4024.938045] 000003e00126264a: a784ffdf brc 8,3e001262608 [ 4024.938053] 000003e00126264e: e32050080004 lg %r2,8(%r5) [ 4024.938060] Call Trace: [ 4024.938063] ([<070000000040716c>] 0x70000000040716c) [ 4024.938069] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 [ 4024.938080] [<00000000001587c2>] __do_softirq+0xda/0x398 [ 4024.938088] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c [ 4024.938095] [<000000000017c2aa>] kthread+0xa6/0xb0 [ 4024.938102] [<000000000061970e>] kernel_thread_starter+0x6/0xc [ 4024.938112] [<0000000000619708>] kernel_thread_starter+0x0/0xc [ 4024.938118] INFO: lockdep is turned off. [ 4024.938121] Last Breaking-Event-Address: [ 4024.938124] [<000003e001262600>] dm_softirq_done+0x38/0x140 [dm_mod] [ 4024.938135] [ 4024.938139] Kernel panic - not syncing: Fatal exception in interrupt [ 4024.938144] CPU: 1 Tainted: G D 3.0.7-50.x.20111021-s390xdefault #1 [ 4024.938150] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) [ 4024.938155] 0000000073adf958 0000000073adf8d8 0000000000000002 0000000000000000 [ 4024.938164] 0000000073adf978 0000000073adf8f0 0000000073adf8f0 000000000061386a [ 4024.938174] 0000000000000000 0000000000000000 0000000000000005 0000000000100ec6 [ 4024.938184] 000000000000000d 000000000000000c 0000000073adf940 0000000000000000 [ 4024.938194] 0000000000000000 0000000000100a18 0000000073adf8d8 0000000073adf918 [ 4024.938205] Call Trace: [ 4024.938208] ([<0000000000100926>] show_trace+0xee/0x144) [ 4024.938216] [<0000000000613694>] panic+0xb0/0x234 [ 4024.938224] [<0000000000100ec6>] die+0x15a/0x168 [ 4024.938230] [<000000000011fb9e>] do_no_context+0xba/0xf8 [ 4024.938306] [<000000000061c074>] do_dat_exception+0x378/0x3e4 [ 4024.938314] [<0000000000619e02>] pgm_exit+0x0/0x4 [ 4024.938319] [<000003e00126263a>] dm_softirq_done+0x72/0x140 [dm_mod] [ 4024.938329] ([<070000000040716c>] 0x70000000040716c) [ 4024.938334] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 [ 4024.938341] [<00000000001587c2>] __do_softirq+0xda/0x398 [ 4024.938347] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c [ 4024.938354] [<000000000017c2aa>] kthread+0xa6/0xb0 [ 4024.938360] [<000000000061970e>] kernel_thread_starter+0x6/0xc [ 4024.938366] [<0000000000619708>] kernel_thread_starter+0x0/0xc [ 4024.938373] INFO: lockdep is turned off. So we thought we might as well upgrade to 3.1 but immediately got a kernel BUG at block/blk-flush.c:323! which was handled here https://lkml.org/lkml/2011/10/4/105 and here https://lkml.org/lkml/2011/10/12/408 . But no patches for that one went upstream AFAICS. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 10:05 ` Heiko Carstens @ 2011-10-31 10:42 ` James Bottomley 2011-10-31 11:46 ` Jun'ichi Nomura 2011-10-31 13:21 ` Mike Snitzer 0 siblings, 2 replies; 61+ messages in thread From: James Bottomley @ 2011-10-31 10:42 UTC (permalink / raw) To: Heiko Carstens Cc: Jun'ichi Nomura, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On Mon, 2011-10-31 at 11:05 +0100, Heiko Carstens wrote: > On Tue, Oct 18, 2011 at 11:29:40AM -0500, James Bottomley wrote: > > On Tue, 2011-10-18 at 17:45 +0200, Heiko Carstens wrote: > > > On Tue, Oct 18, 2011 at 10:31:20PM +0900, Jun'ichi Nomura wrote: > > > > On 10/17/11 23:06, James Bottomley wrote: > > > > > On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: > > > > >> On 10/15/11 01:03, James Bottomley wrote: > > > > >>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: > > > > >>>> Initially, we encountered use-after-free bugs in > > > > >>>> scsi_print_command / scsi_dispatch_cmd > > > > >>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 > > > > >> > > > > >> It is interesting that both this and the older report > > > > >> got oopsed in scsi_log_send(), while there are other > > > > >> dereferences of 'cmd' around scsi_dispatch_cmd(). > > > > >> Are there any reason they are special? Just by accident? > > > > > > > > > > Right, that's why it looks like the command area got freed rather than > > > > > the command pointer was bogus (6b is a poison free pattern). Perhaps if > > > > > the reporter could pin down the failing source line, we'd know better > > > > > what was going on? > > > > > > > > Yeah, that might be useful. > > > > > > The struct scsi_cmnd that was passed to scsi_log_send() was already freed > > > (contents completely 6b6b6b...). > > > Since SLUB debugging was turned on we can see that it was freed from > > > __scsi_put_command(). Not too much of a surprise. > > > > But it does tell us the put must be racing with dispatch, since > > dereferencing the command to find the device worked higher up in > > scsi_dispatch_cmd(). > > > > There is one way to invalidate the theory that we cloned something with > > an attached command, and that's to put > > > > BUG_ON(rq->special) > > > > in blk_insert_cloned_request(). I think we're careful about clearing > > it, so it should work (perhaps a warn on just in case). > > It _looks_ like we do not hit the BUG_ON() that. This time we get this instead: > > [ 4024.937870] Unable to handle kernel pointer dereference at virtual kernel address 000003e004d41000 > [ 4024.937886] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 4024.937899] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [las > t unloaded: scsi_wait_scan] > [ 4024.937925] CPU: 1 Not tainted 3.0.7-50.x.20111021-s390xdefault #1 > [ 4024.937930] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > [ 4024.937936] Krnl PSW : 0704000180000000 000003e00126263a (dm_softirq_done+0x72/0x140 [dm_mod]) > [ 4024.937959] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 > [ 4024.937966] Krnl GPRS: 000000007b9156b0 000003e004d41100 000000000e14b600 000000000000006d > [ 4024.937971] 00000000715332b0 000000000c140ce8 000000000090d2ef 0000000000000005 > [ 4024.937977] 0000000000000001 0000000000000101 000000000c140d00 0000000000000000 > [ 4024.937983] 000003e001260000 000003e00126f098 0000000073adfd08 0000000073adfcb8 > [ 4024.938001] Krnl Code: 000003e00126262a: f0a0000407f1 srp 4(11,%r0),2033,0 > [ 4024.938009] 000003e001262630: e31050080004 lg %r1,8(%r5) > [ 4024.938017] 000003e001262636: 58b05180 l %r11,384(%r5) > [ 4024.938024] >000003e00126263a: e31010080004 lg %r1,8(%r1) > [ 4024.938031] 000003e001262640: e31010500004 lg %r1,80(%r1) > [ 4024.938038] 000003e001262646: b9020011 ltgr %r1,%r1 > [ 4024.938045] 000003e00126264a: a784ffdf brc 8,3e001262608 > [ 4024.938053] 000003e00126264e: e32050080004 lg %r2,8(%r5) > [ 4024.938060] Call Trace: > [ 4024.938063] ([<070000000040716c>] 0x70000000040716c) > [ 4024.938069] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > [ 4024.938080] [<00000000001587c2>] __do_softirq+0xda/0x398 > [ 4024.938088] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > [ 4024.938095] [<000000000017c2aa>] kthread+0xa6/0xb0 > [ 4024.938102] [<000000000061970e>] kernel_thread_starter+0x6/0xc > [ 4024.938112] [<0000000000619708>] kernel_thread_starter+0x0/0xc > [ 4024.938118] INFO: lockdep is turned off. > [ 4024.938121] Last Breaking-Event-Address: > [ 4024.938124] [<000003e001262600>] dm_softirq_done+0x38/0x140 [dm_mod] > [ 4024.938135] > [ 4024.938139] Kernel panic - not syncing: Fatal exception in interrupt > [ 4024.938144] CPU: 1 Tainted: G D 3.0.7-50.x.20111021-s390xdefault #1 > [ 4024.938150] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > [ 4024.938155] 0000000073adf958 0000000073adf8d8 0000000000000002 0000000000000000 > [ 4024.938164] 0000000073adf978 0000000073adf8f0 0000000073adf8f0 000000000061386a > [ 4024.938174] 0000000000000000 0000000000000000 0000000000000005 0000000000100ec6 > [ 4024.938184] 000000000000000d 000000000000000c 0000000073adf940 0000000000000000 > [ 4024.938194] 0000000000000000 0000000000100a18 0000000073adf8d8 0000000073adf918 > [ 4024.938205] Call Trace: > [ 4024.938208] ([<0000000000100926>] show_trace+0xee/0x144) > [ 4024.938216] [<0000000000613694>] panic+0xb0/0x234 > [ 4024.938224] [<0000000000100ec6>] die+0x15a/0x168 > [ 4024.938230] [<000000000011fb9e>] do_no_context+0xba/0xf8 > [ 4024.938306] [<000000000061c074>] do_dat_exception+0x378/0x3e4 > [ 4024.938314] [<0000000000619e02>] pgm_exit+0x0/0x4 > [ 4024.938319] [<000003e00126263a>] dm_softirq_done+0x72/0x140 [dm_mod] > [ 4024.938329] ([<070000000040716c>] 0x70000000040716c) > [ 4024.938334] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > [ 4024.938341] [<00000000001587c2>] __do_softirq+0xda/0x398 > [ 4024.938347] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > [ 4024.938354] [<000000000017c2aa>] kthread+0xa6/0xb0 > [ 4024.938360] [<000000000061970e>] kernel_thread_starter+0x6/0xc > [ 4024.938366] [<0000000000619708>] kernel_thread_starter+0x0/0xc > [ 4024.938373] INFO: lockdep is turned off. > > So we thought we might as well upgrade to 3.1 but immediately got a > > kernel BUG at block/blk-flush.c:323! > > which was handled here https://lkml.org/lkml/2011/10/4/105 and > here https://lkml.org/lkml/2011/10/12/408 . > > But no patches for that one went upstream AFAICS. Well, all I can say is "hm". You put only a BUG_ON() in the code, which wasn't triggered, but now we get a completely different oops. However, I think it does point to the dm barrier handling code. Can you turn off barriers and see if all oopses go away? James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 10:42 ` James Bottomley @ 2011-10-31 11:46 ` Jun'ichi Nomura 2011-10-31 13:00 ` Heiko Carstens 2011-10-31 13:21 ` Mike Snitzer 1 sibling, 1 reply; 61+ messages in thread From: Jun'ichi Nomura @ 2011-10-31 11:46 UTC (permalink / raw) To: Heiko Carstens, James Bottomley Cc: Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On 10/31/11 19:42, James Bottomley wrote: > On Mon, 2011-10-31 at 11:05 +0100, Heiko Carstens wrote: >> On Tue, Oct 18, 2011 at 11:29:40AM -0500, James Bottomley wrote: >>> On Tue, 2011-10-18 at 17:45 +0200, Heiko Carstens wrote: >>>> On Tue, Oct 18, 2011 at 10:31:20PM +0900, Jun'ichi Nomura wrote: >>>>> On 10/17/11 23:06, James Bottomley wrote: >>>>>> On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: >>>>>>> On 10/15/11 01:03, James Bottomley wrote: >>>>>>>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: >>>>>>>>> Initially, we encountered use-after-free bugs in >>>>>>>>> scsi_print_command / scsi_dispatch_cmd >>>>>>>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 >>>>>>> >>>>>>> It is interesting that both this and the older report >>>>>>> got oopsed in scsi_log_send(), while there are other >>>>>>> dereferences of 'cmd' around scsi_dispatch_cmd(). >>>>>>> Are there any reason they are special? Just by accident? >>>>>> >>>>>> Right, that's why it looks like the command area got freed rather than >>>>>> the command pointer was bogus (6b is a poison free pattern). Perhaps if >>>>>> the reporter could pin down the failing source line, we'd know better >>>>>> what was going on? >>>>> >>>>> Yeah, that might be useful. >>>> >>>> The struct scsi_cmnd that was passed to scsi_log_send() was already freed >>>> (contents completely 6b6b6b...). >>>> Since SLUB debugging was turned on we can see that it was freed from >>>> __scsi_put_command(). Not too much of a surprise. >>> >>> But it does tell us the put must be racing with dispatch, since >>> dereferencing the command to find the device worked higher up in >>> scsi_dispatch_cmd(). >>> >>> There is one way to invalidate the theory that we cloned something with >>> an attached command, and that's to put >>> >>> BUG_ON(rq->special) >>> >>> in blk_insert_cloned_request(). I think we're careful about clearing >>> it, so it should work (perhaps a warn on just in case). >> >> It _looks_ like we do not hit the BUG_ON() that. This time we get this instead: >> >> [ 4024.937870] Unable to handle kernel pointer dereference at virtual kernel address 000003e004d41000 >> [ 4024.937886] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC >> [ 4024.937899] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [las >> t unloaded: scsi_wait_scan] >> [ 4024.937925] CPU: 1 Not tainted 3.0.7-50.x.20111021-s390xdefault #1 >> [ 4024.937930] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) >> [ 4024.937936] Krnl PSW : 0704000180000000 000003e00126263a (dm_softirq_done+0x72/0x140 [dm_mod]) >> [ 4024.937959] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 >> [ 4024.937966] Krnl GPRS: 000000007b9156b0 000003e004d41100 000000000e14b600 000000000000006d >> [ 4024.937971] 00000000715332b0 000000000c140ce8 000000000090d2ef 0000000000000005 >> [ 4024.937977] 0000000000000001 0000000000000101 000000000c140d00 0000000000000000 >> [ 4024.937983] 000003e001260000 000003e00126f098 0000000073adfd08 0000000073adfcb8 >> [ 4024.938001] Krnl Code: 000003e00126262a: f0a0000407f1 srp 4(11,%r0),2033,0 >> [ 4024.938009] 000003e001262630: e31050080004 lg %r1,8(%r5) >> [ 4024.938017] 000003e001262636: 58b05180 l %r11,384(%r5) >> [ 4024.938024] >000003e00126263a: e31010080004 lg %r1,8(%r1) >> [ 4024.938031] 000003e001262640: e31010500004 lg %r1,80(%r1) >> [ 4024.938038] 000003e001262646: b9020011 ltgr %r1,%r1 >> [ 4024.938045] 000003e00126264a: a784ffdf brc 8,3e001262608 >> [ 4024.938053] 000003e00126264e: e32050080004 lg %r2,8(%r5) >> [ 4024.938060] Call Trace: >> [ 4024.938063] ([<070000000040716c>] 0x70000000040716c) >> [ 4024.938069] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 >> [ 4024.938080] [<00000000001587c2>] __do_softirq+0xda/0x398 >> [ 4024.938088] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c >> [ 4024.938095] [<000000000017c2aa>] kthread+0xa6/0xb0 >> [ 4024.938102] [<000000000061970e>] kernel_thread_starter+0x6/0xc >> [ 4024.938112] [<0000000000619708>] kernel_thread_starter+0x0/0xc >> [ 4024.938118] INFO: lockdep is turned off. >> [ 4024.938121] Last Breaking-Event-Address: >> [ 4024.938124] [<000003e001262600>] dm_softirq_done+0x38/0x140 [dm_mod] >> [ 4024.938135] >> [ 4024.938139] Kernel panic - not syncing: Fatal exception in interrupt >> [ 4024.938144] CPU: 1 Tainted: G D 3.0.7-50.x.20111021-s390xdefault #1 >> [ 4024.938150] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) >> [ 4024.938155] 0000000073adf958 0000000073adf8d8 0000000000000002 0000000000000000 >> [ 4024.938164] 0000000073adf978 0000000073adf8f0 0000000073adf8f0 000000000061386a >> [ 4024.938174] 0000000000000000 0000000000000000 0000000000000005 0000000000100ec6 >> [ 4024.938184] 000000000000000d 000000000000000c 0000000073adf940 0000000000000000 >> [ 4024.938194] 0000000000000000 0000000000100a18 0000000073adf8d8 0000000073adf918 >> [ 4024.938205] Call Trace: >> [ 4024.938208] ([<0000000000100926>] show_trace+0xee/0x144) >> [ 4024.938216] [<0000000000613694>] panic+0xb0/0x234 >> [ 4024.938224] [<0000000000100ec6>] die+0x15a/0x168 >> [ 4024.938230] [<000000000011fb9e>] do_no_context+0xba/0xf8 >> [ 4024.938306] [<000000000061c074>] do_dat_exception+0x378/0x3e4 >> [ 4024.938314] [<0000000000619e02>] pgm_exit+0x0/0x4 >> [ 4024.938319] [<000003e00126263a>] dm_softirq_done+0x72/0x140 [dm_mod] >> [ 4024.938329] ([<070000000040716c>] 0x70000000040716c) >> [ 4024.938334] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 >> [ 4024.938341] [<00000000001587c2>] __do_softirq+0xda/0x398 >> [ 4024.938347] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c >> [ 4024.938354] [<000000000017c2aa>] kthread+0xa6/0xb0 >> [ 4024.938360] [<000000000061970e>] kernel_thread_starter+0x6/0xc >> [ 4024.938366] [<0000000000619708>] kernel_thread_starter+0x0/0xc >> [ 4024.938373] INFO: lockdep is turned off. >> >> So we thought we might as well upgrade to 3.1 but immediately got a >> >> kernel BUG at block/blk-flush.c:323! >> >> which was handled here https://lkml.org/lkml/2011/10/4/105 and >> here https://lkml.org/lkml/2011/10/12/408 . >> >> But no patches for that one went upstream AFAICS. > > Well, all I can say is "hm". You put only a BUG_ON() in the code, which > wasn't triggered, but now we get a completely different oops. However, > I think it does point to the dm barrier handling code. Can you turn off > barriers and see if all oopses go away? Hm, dm_softirq_done is generic completion code of original request in dm-multipath. So oops here might be another manifestation of use-after-free. Do you always hit the oops at the same address? Could you find corresponding source code line for the crashed address, dm_softirq_done+0x72/0x140, and which pointer was invalid? -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 11:46 ` Jun'ichi Nomura @ 2011-10-31 13:00 ` Heiko Carstens 2011-11-02 12:37 ` Jun'ichi Nomura 2011-11-03 18:25 ` Mike Snitzer 0 siblings, 2 replies; 61+ messages in thread From: Heiko Carstens @ 2011-10-31 13:00 UTC (permalink / raw) To: Jun'ichi Nomura Cc: James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: > Hm, dm_softirq_done is generic completion code of original > request in dm-multipath. > So oops here might be another manifestation of use-after-free. > > Do you always hit the oops at the same address? I think we saw this bug the first time. But before that the scsi logging level was higher. Gonzalo is trying to recreate it with the same (old) scsi logging level. Afterwards we will try with barrier=0. Both on v3.0.7 btw. > Could you find corresponding source code line for > the crashed address, dm_softirq_done+0x72/0x140, > and which pointer was invalid? It crashes in the inlined function dm_done() when trying to dereference tio (aka clone->end_io_data): static void dm_done(struct request *clone, int error, bool mapped) { int r = error; struct dm_rq_target_io *tio = clone->end_io_data; dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 13:00 ` Heiko Carstens @ 2011-11-02 12:37 ` Jun'ichi Nomura 2011-11-02 12:44 ` Hannes Reinecke 2011-11-02 13:47 ` Heiko Carstens 2011-11-03 18:25 ` Mike Snitzer 1 sibling, 2 replies; 61+ messages in thread From: Jun'ichi Nomura @ 2011-11-02 12:37 UTC (permalink / raw) To: Heiko Carstens Cc: James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On 10/31/11 22:00, Heiko Carstens wrote: > On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >> Hm, dm_softirq_done is generic completion code of original >> request in dm-multipath. >> So oops here might be another manifestation of use-after-free. >> >> Do you always hit the oops at the same address? > > I think we saw this bug the first time. But before that the scsi > logging level was higher. Gonzalo is trying to recreate it with > the same (old) scsi logging level. > Afterwards we will try with barrier=0. > > Both on v3.0.7 btw. > >> Could you find corresponding source code line for >> the crashed address, dm_softirq_done+0x72/0x140, >> and which pointer was invalid? > > It crashes in the inlined function dm_done() when trying to > dereference tio (aka clone->end_io_data): > > static void dm_done(struct request *clone, int error, bool mapped) > { > int r = error; > struct dm_rq_target_io *tio = clone->end_io_data; > dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; Thank you. But, hmm. I have no idea about scenario. struct dm_rq_target_io is a container of clone request and clone->end_io_data points to its container. struct dm_rq_target_io { struct mapped_device *md; struct dm_target *ti; struct request *orig, clone; int error; union map_info info; }; If clone can be dereferenced, clone->end_io_data should be, too. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-02 12:37 ` Jun'ichi Nomura @ 2011-11-02 12:44 ` Hannes Reinecke 2011-11-02 13:47 ` Heiko Carstens 1 sibling, 0 replies; 61+ messages in thread From: Hannes Reinecke @ 2011-11-02 12:44 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Heiko Carstens, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On 11/02/2011 01:37 PM, Jun'ichi Nomura wrote: > On 10/31/11 22:00, Heiko Carstens wrote: >> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >>> Hm, dm_softirq_done is generic completion code of original >>> request in dm-multipath. >>> So oops here might be another manifestation of use-after-free. >>> >>> Do you always hit the oops at the same address? >> >> I think we saw this bug the first time. But before that the scsi >> logging level was higher. Gonzalo is trying to recreate it with >> the same (old) scsi logging level. >> Afterwards we will try with barrier=0. >> >> Both on v3.0.7 btw. >> >>> Could you find corresponding source code line for >>> the crashed address, dm_softirq_done+0x72/0x140, >>> and which pointer was invalid? >> >> It crashes in the inlined function dm_done() when trying to >> dereference tio (aka clone->end_io_data): >> >> static void dm_done(struct request *clone, int error, bool mapped) >> { >> int r = error; >> struct dm_rq_target_io *tio = clone->end_io_data; >> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; > > Thank you. But, hmm. I have no idea about scenario. > > struct dm_rq_target_io is a container of clone request > and clone->end_io_data points to its container. > > struct dm_rq_target_io { > struct mapped_device *md; > struct dm_target *ti; > struct request *orig, clone; > int error; > union map_info info; > }; > > If clone can be dereferenced, clone->end_io_data should be, too. > Well, actually it _always_ can be dereferenced. At the very least we'd need to do an integrity check, ie if the pointer 'clone->end_io_data' is indeed of the required type. More to the point, the end_io_data pointer could've been assigned to something else; so even though the pointer is set (which we don't check, either), it might not be pointing to a 'struct dm_rq_target_io'. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-02 12:37 ` Jun'ichi Nomura 2011-11-02 12:44 ` Hannes Reinecke @ 2011-11-02 13:47 ` Heiko Carstens 2011-11-04 4:07 ` Jun'ichi Nomura 1 sibling, 1 reply; 61+ messages in thread From: Heiko Carstens @ 2011-11-02 13:47 UTC (permalink / raw) To: Jun'ichi Nomura Cc: James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On Wed, Nov 02, 2011 at 09:37:06PM +0900, Jun'ichi Nomura wrote: > On 10/31/11 22:00, Heiko Carstens wrote: > > On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: > >> Hm, dm_softirq_done is generic completion code of original > >> request in dm-multipath. > >> So oops here might be another manifestation of use-after-free. > >> > >> Do you always hit the oops at the same address? > > > > I think we saw this bug the first time. But before that the scsi > > logging level was higher. Gonzalo is trying to recreate it with > > the same (old) scsi logging level. > > Afterwards we will try with barrier=0. > > > > Both on v3.0.7 btw. > > > >> Could you find corresponding source code line for > >> the crashed address, dm_softirq_done+0x72/0x140, > >> and which pointer was invalid? > > > > It crashes in the inlined function dm_done() when trying to > > dereference tio (aka clone->end_io_data): > > > > static void dm_done(struct request *clone, int error, bool mapped) > > { > > int r = error; > > struct dm_rq_target_io *tio = clone->end_io_data; > > dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; > > Thank you. But, hmm. I have no idea about scenario. > > struct dm_rq_target_io is a container of clone request > and clone->end_io_data points to its container. > > struct dm_rq_target_io { > struct mapped_device *md; > struct dm_target *ti; > struct request *orig, clone; > int error; > union map_info info; > }; > > If clone can be dereferenced, clone->end_io_data should be, too. If it helps: the above *ti pointer is the only one that points to an (invalid) vmalloc area address. Invalid means the page was unmapped because it was freed because of DEBUG_PAGEALLOC. All other addresses I followed to get to this one belong to the 1:1 mapping of the kernel, so no vmalloc involved. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-02 13:47 ` Heiko Carstens @ 2011-11-04 4:07 ` Jun'ichi Nomura 2011-11-04 9:12 ` Heiko Carstens 0 siblings, 1 reply; 61+ messages in thread From: Jun'ichi Nomura @ 2011-11-04 4:07 UTC (permalink / raw) To: Heiko Carstens Cc: James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On 11/02/11 22:47, Heiko Carstens wrote: > On Wed, Nov 02, 2011 at 09:37:06PM +0900, Jun'ichi Nomura wrote: >> On 10/31/11 22:00, Heiko Carstens wrote: >>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >>>> Hm, dm_softirq_done is generic completion code of original >>>> request in dm-multipath. >>>> So oops here might be another manifestation of use-after-free. >>>> >>>> Do you always hit the oops at the same address? >>> >>> I think we saw this bug the first time. But before that the scsi >>> logging level was higher. Gonzalo is trying to recreate it with >>> the same (old) scsi logging level. >>> Afterwards we will try with barrier=0. >>> >>> Both on v3.0.7 btw. >>> >>>> Could you find corresponding source code line for >>>> the crashed address, dm_softirq_done+0x72/0x140, >>>> and which pointer was invalid? >>> >>> It crashes in the inlined function dm_done() when trying to >>> dereference tio (aka clone->end_io_data): >>> >>> static void dm_done(struct request *clone, int error, bool mapped) >>> { >>> int r = error; >>> struct dm_rq_target_io *tio = clone->end_io_data; >>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; >> >> Thank you. But, hmm. I have no idea about scenario. >> >> struct dm_rq_target_io is a container of clone request >> and clone->end_io_data points to its container. >> >> struct dm_rq_target_io { >> struct mapped_device *md; >> struct dm_target *ti; >> struct request *orig, clone; >> int error; >> union map_info info; >> }; >> >> If clone can be dereferenced, clone->end_io_data should be, too. > > If it helps: the above *ti pointer is the only one that points to > an (invalid) vmalloc area address. Invalid means the page was unmapped > because it was freed because of DEBUG_PAGEALLOC. > All other addresses I followed to get to this one belong to > the 1:1 mapping of the kernel, so no vmalloc involved. Thanks, ok it was ti which was invalid. Not tio. ti is a pointer to dm table entry, which is vmalloc-ed. So it means the dm table was replaced while I/O was in-flight. dm has a machanism to prevent it: in dm_suspend(), stop_queue() is called to stop block queue processing and no new I/O becomes in-flight after that. Then all in-flight I/Os are waited to be completed or requeued (dm_wait_for_completion()). If the wait was successful, the table can become "suspended", i.e. ready to be replaced. So ti should be always valid. Hmm.. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-04 4:07 ` Jun'ichi Nomura @ 2011-11-04 9:12 ` Heiko Carstens 0 siblings, 0 replies; 61+ messages in thread From: Heiko Carstens @ 2011-11-04 9:12 UTC (permalink / raw) To: Jun'ichi Nomura Cc: James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, Mike Snitzer, gmuelas On Fri, Nov 04, 2011 at 01:07:53PM +0900, Jun'ichi Nomura wrote: > Thanks, ok it was ti which was invalid. Not tio. > ti is a pointer to dm table entry, which is vmalloc-ed. > So it means the dm table was replaced while I/O was in-flight. > > dm has a machanism to prevent it: in dm_suspend(), > stop_queue() is called to stop block queue processing > and no new I/O becomes in-flight after that. > Then all in-flight I/Os are waited to be completed or requeued > (dm_wait_for_completion()). > If the wait was successful, the table can become "suspended", > i.e. ready to be replaced. > > So ti should be always valid. > Hmm.. Here is another one (crash at same place in code). Might be of interest since userspace triggered a call to dm_suspend(). While executing an irq arrives and the same crash as observed before happened: [96875.876768] Unable to handle kernel pointer dereference at virtual kernel address 000003e004bcf000 [96875.876773] Oops: 0011 [#2] PREEMPT SMP DEBUG_PAGEALLOC [96875.876780] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [last unloaded: scsi_wait_scan] [96875.876800] CPU: 0 Tainted: G D W 3.0.7-50.x.20111024-s390xdefault #1 [96875.876804] Process kpartx (pid: 36986, task: 0000000058754750, ksp: 000000005874b318) [96875.876808] Krnl PSW : 0704000180000000 000003e0012e363a (dm_softirq_done+0x72/0x140 [dm_mod]) [96875.876819] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 [96875.876825] Krnl GPRS: 000000007b9156b0 000003e004bcf100 00000000586f2f40 0000000000000065 [96875.876829] 00000000586f3610 00000000717a9f58 0000000000000002 0000000000000005 [96875.876834] 0000000000000000 0400000000000102 00000000717a9f70 0000000000000000 [96875.876838] 000003e0012e1000 000003e0012f0098 000000007a9c7c40 000000007a9c7bf0 [96875.876852] Krnl Code: 000003e0012e362a: f0a0000407f1 srp 4(11,%r0),2033,0 [96875.876858] 000003e0012e3630: e31050080004 lg %r1,8(%r5) [96875.876864] 000003e0012e3636: 58b05180 l %r11,384(%r5) [96875.876869] >000003e0012e363a: e31010080004 lg %r1,8(%r1) [96875.876875] 000003e0012e3640: e31010500004 lg %r1,80(%r1) [96875.876880] 000003e0012e3646: b9020011 ltgr %r1,%r1 [96875.876885] 000003e0012e364a: a784ffdf brc 8,3e0012e3608 [96875.876891] 000003e0012e364e: e32050080004 lg %r2,8(%r5) [96875.876895] Call Trace: [96875.876898] ([<070000000040716c>] 0x70000000040716c) [96875.876902] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 [96875.876909] [<00000000001587c2>] __do_softirq+0xda/0x398 [96875.876914] [<000000000010f47e>] do_softirq+0xe2/0xe8 [96875.876919] [<0000000000158e2c>] irq_exit+0xc8/0xcc [96875.876924] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc [96875.876930] [<000000000061a164>] io_return+0x0/0x16 [96875.876935] [<000000000061c2a0>] sub_preempt_count+0x34/0xd4 [96875.876941] ([<000000005874ba38>] 0x5874ba38) [96875.876945] [<000000000061918c>] _raw_spin_unlock_irq+0x50/0x7c [96875.876951] [<00000000001729ea>] flush_workqueue_prep_cwqs+0x222/0x4dc [96875.876959] [<0000000000174718>] flush_workqueue+0x1e8/0x4c0 [96875.876964] [<000003e0012e63e8>] dm_suspend+0x174/0x384 [dm_mod] [96875.876973] [<000003e0012ebe1e>] dev_suspend+0x21e/0x250 [dm_mod] [96875.876983] [<000003e0012eccb6>] ctl_ioctl+0x1e2/0x2f4 [dm_mod] [96875.876992] [<000003e0012ecdf2>] dm_ctl_ioctl+0x2a/0x38 [dm_mod] [96875.877000] [<0000000000291c98>] do_vfs_ioctl+0x94/0x588 [96875.877005] [<0000000000292220>] SyS_ioctl+0x94/0xac [96875.877010] [<0000000000619af2>] sysc_noemu+0x16/0x1c [96875.877015] [<000003fffd32f7ca>] 0x3fffd32f7ca [96875.877019] INFO: lockdep is turned off. [96875.877022] Last Breaking-Event-Address: [96875.877025] [<000003e0012e3600>] dm_softirq_done+0x38/0x140 [dm_mod] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 13:00 ` Heiko Carstens 2011-11-02 12:37 ` Jun'ichi Nomura @ 2011-11-03 18:25 ` Mike Snitzer 2011-11-04 9:19 ` Heiko Carstens 1 sibling, 1 reply; 61+ messages in thread From: Mike Snitzer @ 2011-11-03 18:25 UTC (permalink / raw) To: Heiko Carstens Cc: Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Mon, Oct 31 2011 at 9:00am -0400, Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: > > Hm, dm_softirq_done is generic completion code of original > > request in dm-multipath. > > So oops here might be another manifestation of use-after-free. > > > > Do you always hit the oops at the same address? > > I think we saw this bug the first time. But before that the scsi > logging level was higher. Gonzalo is trying to recreate it with > the same (old) scsi logging level. > Afterwards we will try with barrier=0. > > Both on v3.0.7 btw. > > > Could you find corresponding source code line for > > the crashed address, dm_softirq_done+0x72/0x140, > > and which pointer was invalid? > > It crashes in the inlined function dm_done() when trying to > dereference tio (aka clone->end_io_data): > > static void dm_done(struct request *clone, int error, bool mapped) > { > int r = error; > struct dm_rq_target_io *tio = clone->end_io_data; > dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; Hi, Which underlying storage driver is being used by this multipath device? Thanks, Mike ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-03 18:25 ` Mike Snitzer @ 2011-11-04 9:19 ` Heiko Carstens 2011-11-04 13:30 ` Mike Snitzer ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Heiko Carstens @ 2011-11-04 9:19 UTC (permalink / raw) To: Mike Snitzer Cc: Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: > On Mon, Oct 31 2011 at 9:00am -0400, > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: > > > Hm, dm_softirq_done is generic completion code of original > > > request in dm-multipath. > > > So oops here might be another manifestation of use-after-free. > > > > > > Do you always hit the oops at the same address? > > > > I think we saw this bug the first time. But before that the scsi > > logging level was higher. Gonzalo is trying to recreate it with > > the same (old) scsi logging level. > > Afterwards we will try with barrier=0. > > > > Both on v3.0.7 btw. > > > > > Could you find corresponding source code line for > > > the crashed address, dm_softirq_done+0x72/0x140, > > > and which pointer was invalid? > > > > It crashes in the inlined function dm_done() when trying to > > dereference tio (aka clone->end_io_data): > > > > static void dm_done(struct request *clone, int error, bool mapped) > > { > > int r = error; > > struct dm_rq_target_io *tio = clone->end_io_data; > > dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; > > Hi, > > Which underlying storage driver is being used by this multipath device? It's the s390 only zfcp device driver. FWIW, yet another use-after-free crash, this time however in multipath_end_io: [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 [96875.870602] Oops: 0038 [#1] [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ st unloaded: scsi_wait_scan] [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) [96875.870889] Call Trace: [96875.870892] ([<0700000000000008>] 0x700000000000008) [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc [96875.870953] [<000000000061a164>] io_return+0x0/0x16 [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e [96875.871028] INFO: lockdep is turned off. [96875.871031] Last Breaking-Event-Address: [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] static int multipath_end_io(struct dm_target *ti, struct request *clone, int error, union map_info *map_context) { struct multipath *m = ti->private; struct dm_mpath_io *mpio = map_context->ptr; struct pgpath *pgpath = mpio->pgpath; struct path_selector *ps; int r; r = do_end_io(m, clone, error, mpio); if (pgpath) { ps = &pgpath->pg->ps; <--- crashes here if (ps->type->end_io) ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); } mempool_free(mpio, m->mpio_pool); return r; } It crashes when trying to derefence pgpath, which was freed. Since we have SLUB debugging turned on the freed object tells us that it was allocated via a call to multipath_ctr() and freed via a call to free_priority_group(). FWIW, this reproduction was done with barriers off. Btw, since I cc'ed you rather late I'm not sure if you are aware of the test scenario: it's I/O stress with multipathing were paths come and go all the time. It usually takes quite a few hours before the crashes are observed. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-04 9:19 ` Heiko Carstens @ 2011-11-04 13:30 ` Mike Snitzer 2011-11-04 13:37 ` Hannes Reinecke ` (2 more replies) 2011-11-07 11:30 ` Jun'ichi Nomura 2011-11-09 9:37 ` Hannes Reinecke 2 siblings, 3 replies; 61+ messages in thread From: Mike Snitzer @ 2011-11-04 13:30 UTC (permalink / raw) To: Heiko Carstens Cc: Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Fri, Nov 04 2011 at 5:19am -0400, Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: > > On Mon, Oct 31 2011 at 9:00am -0400, > > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > > > On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: > > > > Hm, dm_softirq_done is generic completion code of original > > > > request in dm-multipath. > > > > So oops here might be another manifestation of use-after-free. > > > > > > > > Do you always hit the oops at the same address? > > > > > > I think we saw this bug the first time. But before that the scsi > > > logging level was higher. Gonzalo is trying to recreate it with > > > the same (old) scsi logging level. > > > Afterwards we will try with barrier=0. > > > > > > Both on v3.0.7 btw. > > > > > > > Could you find corresponding source code line for > > > > the crashed address, dm_softirq_done+0x72/0x140, > > > > and which pointer was invalid? > > > > > > It crashes in the inlined function dm_done() when trying to > > > dereference tio (aka clone->end_io_data): > > > > > > static void dm_done(struct request *clone, int error, bool mapped) > > > { > > > int r = error; > > > struct dm_rq_target_io *tio = clone->end_io_data; > > > dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; > > > > Hi, > > > > Which underlying storage driver is being used by this multipath device? > > It's the s390 only zfcp device driver. > > FWIW, yet another use-after-free crash, this time however in multipath_end_io: > > [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > [96875.870602] Oops: 0038 [#1] > [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > st unloaded: scsi_wait_scan] > [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) > [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > [96875.870889] Call Trace: > [96875.870892] ([<0700000000000008>] 0x700000000000008) > [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > [96875.870953] [<000000000061a164>] io_return+0x0/0x16 > [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > [96875.871028] INFO: lockdep is turned off. > [96875.871031] Last Breaking-Event-Address: > [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] > > static int multipath_end_io(struct dm_target *ti, struct request *clone, > int error, union map_info *map_context) > { > struct multipath *m = ti->private; > struct dm_mpath_io *mpio = map_context->ptr; > struct pgpath *pgpath = mpio->pgpath; > struct path_selector *ps; > int r; > > r = do_end_io(m, clone, error, mpio); > if (pgpath) { > ps = &pgpath->pg->ps; <--- crashes here > if (ps->type->end_io) > ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); > } > mempool_free(mpio, m->mpio_pool); > > return r; > } > > It crashes when trying to derefence pgpath, which was freed. Since we have > SLUB debugging turned on the freed object tells us that it was allocated > via a call to multipath_ctr() and freed via a call to free_priority_group(). > > FWIW, this reproduction was done with barriers off. > > Btw, since I cc'ed you rather late I'm not sure if you are aware of the > test scenario: it's I/O stress with multipathing were paths come and go > all the time. It usually takes quite a few hours before the crashes are > observed. OK, thanks for the backstory. That is the same type of testing we've been doing with some partners for RHEL6.2 with the qla2xxx driver. They have seen the same crash that you originally reported here: https://lkml.org/lkml/2011/10/31/64 The really interesting observation that was made is that the qla2xxx driver was made lockless in RHEL6.2. We've found that reverting the qla2xxx lockless changes eliminates the problems seen with it and I/O stress testing with multipath path failures. The zfcp driver was also made lockless upstream, via this commit: e55f875 [SCSI] zfcp: Issue FCP command without holding SCSI host_lock It would be great if you could try reverting e55f875 and see how your testing goes. If doing so resolves the crashes for you then the post mortem on why these lockless SCSI driver changes are causing such odd multipath completion failures is going to be "fun" ;) Mike ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-04 13:30 ` Mike Snitzer @ 2011-11-04 13:37 ` Hannes Reinecke 2011-11-07 11:31 ` Jun'ichi Nomura 2011-11-07 12:23 ` Heiko Carstens 2 siblings, 0 replies; 61+ messages in thread From: Hannes Reinecke @ 2011-11-04 13:37 UTC (permalink / raw) To: Mike Snitzer Cc: Heiko Carstens, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On 11/04/2011 02:30 PM, Mike Snitzer wrote: > On Fri, Nov 04 2011 at 5:19am -0400, > Heiko Carstens<heiko.carstens@de.ibm.com> wrote: > >> On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: >>> On Mon, Oct 31 2011 at 9:00am -0400, >>> Heiko Carstens<heiko.carstens@de.ibm.com> wrote: >>> >>>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >>>>> Hm, dm_softirq_done is generic completion code of original >>>>> request in dm-multipath. >>>>> So oops here might be another manifestation of use-after-free. >>>>> >>>>> Do you always hit the oops at the same address? >>>> >>>> I think we saw this bug the first time. But before that the scsi >>>> logging level was higher. Gonzalo is trying to recreate it with >>>> the same (old) scsi logging level. >>>> Afterwards we will try with barrier=0. >>>> >>>> Both on v3.0.7 btw. >>>> >>>>> Could you find corresponding source code line for >>>>> the crashed address, dm_softirq_done+0x72/0x140, >>>>> and which pointer was invalid? >>>> >>>> It crashes in the inlined function dm_done() when trying to >>>> dereference tio (aka clone->end_io_data): >>>> >>>> static void dm_done(struct request *clone, int error, bool mapped) >>>> { >>>> int r = error; >>>> struct dm_rq_target_io *tio = clone->end_io_data; >>>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; >>> >>> Hi, >>> >>> Which underlying storage driver is being used by this multipath device? >> >> It's the s390 only zfcp device driver. >> >> FWIW, yet another use-after-free crash, this time however in multipath_end_io: >> >> [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 >> [96875.870602] Oops: 0038 [#1] >> [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC >> [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ >> st unloaded: scsi_wait_scan] >> [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 >> [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) >> [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) >> [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 >> [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 >> [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 >> [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 >> [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 >> [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 >> [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c >> [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) >> [96875.870809]>000003e001347138: e31020180004 lg %r1,24(%r2) >> [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) >> [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 >> [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c >> [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) >> [96875.870889] Call Trace: >> [96875.870892] ([<0700000000000008>] 0x700000000000008) >> [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] >> [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 >> [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 >> [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 >> [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc >> [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc >> [96875.870953] [<000000000061a164>] io_return+0x0/0x16 >> [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 >> [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) >> [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 >> [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 >> [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 >> [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c >> [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e >> [96875.871028] INFO: lockdep is turned off. >> [96875.871031] Last Breaking-Event-Address: >> [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] >> >> static int multipath_end_io(struct dm_target *ti, struct request *clone, >> int error, union map_info *map_context) >> { >> struct multipath *m = ti->private; >> struct dm_mpath_io *mpio = map_context->ptr; >> struct pgpath *pgpath = mpio->pgpath; >> struct path_selector *ps; >> int r; >> >> r = do_end_io(m, clone, error, mpio); >> if (pgpath) { >> ps =&pgpath->pg->ps;<--- crashes here >> if (ps->type->end_io) >> ps->type->end_io(ps,&pgpath->path, mpio->nr_bytes); >> } >> mempool_free(mpio, m->mpio_pool); >> >> return r; >> } >> >> It crashes when trying to derefence pgpath, which was freed. Since we have >> SLUB debugging turned on the freed object tells us that it was allocated >> via a call to multipath_ctr() and freed via a call to free_priority_group(). >> >> FWIW, this reproduction was done with barriers off. >> >> Btw, since I cc'ed you rather late I'm not sure if you are aware of the >> test scenario: it's I/O stress with multipathing were paths come and go >> all the time. It usually takes quite a few hours before the crashes are >> observed. > > OK, thanks for the backstory. > > That is the same type of testing we've been doing with some partners > for RHEL6.2 with the qla2xxx driver. They have seen the same crash that > you originally reported here: https://lkml.org/lkml/2011/10/31/64 > > The really interesting observation that was made is that the qla2xxx > driver was made lockless in RHEL6.2. We've found that reverting the > qla2xxx lockless changes eliminates the problems seen with it and I/O > stress testing with multipath path failures. > > The zfcp driver was also made lockless upstream, via this commit: > e55f875 [SCSI] zfcp: Issue FCP command without holding SCSI host_lock > > It would be great if you could try reverting e55f875 and see how your > testing goes. > > If doing so resolves the crashes for you then the post mortem on why > these lockless SCSI driver changes are causing such odd multipath > completion failures is going to be "fun" ;) > And before anyone asks: We will _not_ be reverting the lockless changes. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-04 13:30 ` Mike Snitzer 2011-11-04 13:37 ` Hannes Reinecke @ 2011-11-07 11:31 ` Jun'ichi Nomura 2011-11-07 13:42 ` Mike Snitzer 2011-11-07 12:23 ` Heiko Carstens 2 siblings, 1 reply; 61+ messages in thread From: Jun'ichi Nomura @ 2011-11-07 11:31 UTC (permalink / raw) To: Mike Snitzer Cc: Heiko Carstens, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On 11/04/11 22:30, Mike Snitzer wrote: > On Fri, Nov 04 2011 at 5:19am -0400, > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > >> On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: >>> On Mon, Oct 31 2011 at 9:00am -0400, >>> Heiko Carstens <heiko.carstens@de.ibm.com> wrote: >>> >>>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >>>>> Hm, dm_softirq_done is generic completion code of original >>>>> request in dm-multipath. >>>>> So oops here might be another manifestation of use-after-free. >>>>> >>>>> Do you always hit the oops at the same address? >>>> >>>> I think we saw this bug the first time. But before that the scsi >>>> logging level was higher. Gonzalo is trying to recreate it with >>>> the same (old) scsi logging level. >>>> Afterwards we will try with barrier=0. >>>> >>>> Both on v3.0.7 btw. >>>> >>>>> Could you find corresponding source code line for >>>>> the crashed address, dm_softirq_done+0x72/0x140, >>>>> and which pointer was invalid? >>>> >>>> It crashes in the inlined function dm_done() when trying to >>>> dereference tio (aka clone->end_io_data): >>>> >>>> static void dm_done(struct request *clone, int error, bool mapped) >>>> { >>>> int r = error; >>>> struct dm_rq_target_io *tio = clone->end_io_data; >>>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; >>> >>> Hi, >>> >>> Which underlying storage driver is being used by this multipath device? >> >> It's the s390 only zfcp device driver. >> >> FWIW, yet another use-after-free crash, this time however in multipath_end_io: >> >> [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 >> [96875.870602] Oops: 0038 [#1] >> [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC >> [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ >> st unloaded: scsi_wait_scan] >> [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 >> [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) >> [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) >> [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 >> [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 >> [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 >> [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 >> [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 >> [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 >> [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c >> [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) >> [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) >> [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) >> [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 >> [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c >> [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) >> [96875.870889] Call Trace: >> [96875.870892] ([<0700000000000008>] 0x700000000000008) >> [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] >> [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 >> [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 >> [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 >> [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc >> [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc >> [96875.870953] [<000000000061a164>] io_return+0x0/0x16 >> [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 >> [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) >> [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 >> [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 >> [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 >> [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c >> [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e >> [96875.871028] INFO: lockdep is turned off. >> [96875.871031] Last Breaking-Event-Address: >> [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] >> >> static int multipath_end_io(struct dm_target *ti, struct request *clone, >> int error, union map_info *map_context) >> { >> struct multipath *m = ti->private; >> struct dm_mpath_io *mpio = map_context->ptr; >> struct pgpath *pgpath = mpio->pgpath; >> struct path_selector *ps; >> int r; >> >> r = do_end_io(m, clone, error, mpio); >> if (pgpath) { >> ps = &pgpath->pg->ps; <--- crashes here >> if (ps->type->end_io) >> ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); >> } >> mempool_free(mpio, m->mpio_pool); >> >> return r; >> } >> >> It crashes when trying to derefence pgpath, which was freed. Since we have >> SLUB debugging turned on the freed object tells us that it was allocated >> via a call to multipath_ctr() and freed via a call to free_priority_group(). >> >> FWIW, this reproduction was done with barriers off. >> >> Btw, since I cc'ed you rather late I'm not sure if you are aware of the >> test scenario: it's I/O stress with multipathing were paths come and go >> all the time. It usually takes quite a few hours before the crashes are >> observed. > > OK, thanks for the backstory. > > That is the same type of testing we've been doing with some partners > for RHEL6.2 with the qla2xxx driver. They have seen the same crash that > you originally reported here: https://lkml.org/lkml/2011/10/31/64 > > The really interesting observation that was made is that the qla2xxx > driver was made lockless in RHEL6.2. We've found that reverting the > qla2xxx lockless changes eliminates the problems seen with it and I/O > stress testing with multipath path failures. It's interesting. Have you analyzed how the lockless change caused the dm crash? > The zfcp driver was also made lockless upstream, via this commit: > e55f875 [SCSI] zfcp: Issue FCP command without holding SCSI host_lock > > It would be great if you could try reverting e55f875 and see how your > testing goes. > > If doing so resolves the crashes for you then the post mortem on why > these lockless SCSI driver changes are causing such odd multipath > completion failures is going to be "fun" ;) -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-07 11:31 ` Jun'ichi Nomura @ 2011-11-07 13:42 ` Mike Snitzer 0 siblings, 0 replies; 61+ messages in thread From: Mike Snitzer @ 2011-11-07 13:42 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Heiko Carstens, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Mon, Nov 07 2011 at 6:31am -0500, Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > On 11/04/11 22:30, Mike Snitzer wrote: > > On Fri, Nov 04 2011 at 5:19am -0400, > > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > >> On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: > >>> On Mon, Oct 31 2011 at 9:00am -0400, > >>> Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > >>> > >>>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: > >>>>> Hm, dm_softirq_done is generic completion code of original > >>>>> request in dm-multipath. > >>>>> So oops here might be another manifestation of use-after-free. > >>>>> > >>>>> Do you always hit the oops at the same address? > >>>> > >>>> I think we saw this bug the first time. But before that the scsi > >>>> logging level was higher. Gonzalo is trying to recreate it with > >>>> the same (old) scsi logging level. > >>>> Afterwards we will try with barrier=0. > >>>> > >>>> Both on v3.0.7 btw. > >>>> > >>>>> Could you find corresponding source code line for > >>>>> the crashed address, dm_softirq_done+0x72/0x140, > >>>>> and which pointer was invalid? > >>>> > >>>> It crashes in the inlined function dm_done() when trying to > >>>> dereference tio (aka clone->end_io_data): > >>>> > >>>> static void dm_done(struct request *clone, int error, bool mapped) > >>>> { > >>>> int r = error; > >>>> struct dm_rq_target_io *tio = clone->end_io_data; > >>>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; > >>> > >>> Hi, > >>> > >>> Which underlying storage driver is being used by this multipath device? > >> > >> It's the s390 only zfcp device driver. > >> > >> FWIW, yet another use-after-free crash, this time however in multipath_end_io: > >> > >> [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > >> [96875.870602] Oops: 0038 [#1] > >> [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > >> [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > >> st unloaded: scsi_wait_scan] > >> [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > >> [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > >> [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > >> [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > >> [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > >> [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > >> [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > >> [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > >> [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > >> [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > >> [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > >> [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) > >> [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > >> [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > >> [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > >> [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > >> [96875.870889] Call Trace: > >> [96875.870892] ([<0700000000000008>] 0x700000000000008) > >> [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > >> [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > >> [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > >> [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > >> [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > >> [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > >> [96875.870953] [<000000000061a164>] io_return+0x0/0x16 > >> [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > >> [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > >> [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > >> [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > >> [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > >> [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > >> [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > >> [96875.871028] INFO: lockdep is turned off. > >> [96875.871031] Last Breaking-Event-Address: > >> [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] > >> > >> static int multipath_end_io(struct dm_target *ti, struct request *clone, > >> int error, union map_info *map_context) > >> { > >> struct multipath *m = ti->private; > >> struct dm_mpath_io *mpio = map_context->ptr; > >> struct pgpath *pgpath = mpio->pgpath; > >> struct path_selector *ps; > >> int r; > >> > >> r = do_end_io(m, clone, error, mpio); > >> if (pgpath) { > >> ps = &pgpath->pg->ps; <--- crashes here > >> if (ps->type->end_io) > >> ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); > >> } > >> mempool_free(mpio, m->mpio_pool); > >> > >> return r; > >> } > >> > >> It crashes when trying to derefence pgpath, which was freed. Since we have > >> SLUB debugging turned on the freed object tells us that it was allocated > >> via a call to multipath_ctr() and freed via a call to free_priority_group(). > >> > >> FWIW, this reproduction was done with barriers off. > >> > >> Btw, since I cc'ed you rather late I'm not sure if you are aware of the > >> test scenario: it's I/O stress with multipathing were paths come and go > >> all the time. It usually takes quite a few hours before the crashes are > >> observed. > > > > OK, thanks for the backstory. > > > > That is the same type of testing we've been doing with some partners > > for RHEL6.2 with the qla2xxx driver. They have seen the same crash that > > you originally reported here: https://lkml.org/lkml/2011/10/31/64 > > > > The really interesting observation that was made is that the qla2xxx > > driver was made lockless in RHEL6.2. We've found that reverting the > > qla2xxx lockless changes eliminates the problems seen with it and I/O > > stress testing with multipath path failures. > > It's interesting. > Have you analyzed how the lockless change caused the dm crash? Not yet. It will take me some time (as I'm not overly familiar with the SCSI lockless changes). Any assistance would be appreciated. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-04 13:30 ` Mike Snitzer 2011-11-04 13:37 ` Hannes Reinecke 2011-11-07 11:31 ` Jun'ichi Nomura @ 2011-11-07 12:23 ` Heiko Carstens 2 siblings, 0 replies; 61+ messages in thread From: Heiko Carstens @ 2011-11-07 12:23 UTC (permalink / raw) To: Mike Snitzer Cc: Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Fri, Nov 04, 2011 at 09:30:52AM -0400, Mike Snitzer wrote: > > FWIW, yet another use-after-free crash, this time however in multipath_end_io: > > > > [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > > [96875.870602] Oops: 0038 [#1] > > [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > > [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > > st unloaded: scsi_wait_scan] > > [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > > [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > > [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > > [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > > [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > > [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > > [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > > [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > > [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > > [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > > [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > > [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) > > [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > > [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > > [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > > [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > > [96875.870889] Call Trace: > > [96875.870892] ([<0700000000000008>] 0x700000000000008) > > [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > > [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > > [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > > [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > > [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > > [96875.870953] [<000000000061a164>] io_return+0x0/0x16 > > [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > > [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > > [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > > [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > > [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > > [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > > [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > > [96875.871028] INFO: lockdep is turned off. > > [96875.871031] Last Breaking-Event-Address: > > [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] [...] > OK, thanks for the backstory. > > That is the same type of testing we've been doing with some partners > for RHEL6.2 with the qla2xxx driver. They have seen the same crash that > you originally reported here: https://lkml.org/lkml/2011/10/31/64 > > The really interesting observation that was made is that the qla2xxx > driver was made lockless in RHEL6.2. We've found that reverting the > qla2xxx lockless changes eliminates the problems seen with it and I/O > stress testing with multipath path failures. > > The zfcp driver was also made lockless upstream, via this commit: > e55f875 [SCSI] zfcp: Issue FCP command without holding SCSI host_lock > > It would be great if you could try reverting e55f875 and see how your > testing goes. Ok, we did that, and of course it sometimes runs into deadlocks (since the commit fixed a deadlock), however we still see the same crash in multipath_end_io() from above. > If doing so resolves the crashes for you then the post mortem on why > these lockless SCSI driver changes are causing such odd multipath > completion failures is going to be "fun" ;) Must be something different then... ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-04 9:19 ` Heiko Carstens 2011-11-04 13:30 ` Mike Snitzer @ 2011-11-07 11:30 ` Jun'ichi Nomura 2011-11-07 15:36 ` Mike Snitzer 2011-11-09 9:37 ` Hannes Reinecke 2 siblings, 1 reply; 61+ messages in thread From: Jun'ichi Nomura @ 2011-11-07 11:30 UTC (permalink / raw) To: Heiko Carstens Cc: Mike Snitzer, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On 11/04/11 18:19, Heiko Carstens wrote: > On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: >> On Mon, Oct 31 2011 at 9:00am -0400, >> Heiko Carstens <heiko.carstens@de.ibm.com> wrote: >> >>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >>>> Hm, dm_softirq_done is generic completion code of original >>>> request in dm-multipath. >>>> So oops here might be another manifestation of use-after-free. >>>> >>>> Do you always hit the oops at the same address? >>> >>> I think we saw this bug the first time. But before that the scsi >>> logging level was higher. Gonzalo is trying to recreate it with >>> the same (old) scsi logging level. >>> Afterwards we will try with barrier=0. >>> >>> Both on v3.0.7 btw. >>> >>>> Could you find corresponding source code line for >>>> the crashed address, dm_softirq_done+0x72/0x140, >>>> and which pointer was invalid? >>> >>> It crashes in the inlined function dm_done() when trying to >>> dereference tio (aka clone->end_io_data): >>> >>> static void dm_done(struct request *clone, int error, bool mapped) >>> { >>> int r = error; >>> struct dm_rq_target_io *tio = clone->end_io_data; >>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; >> >> Hi, >> >> Which underlying storage driver is being used by this multipath device? > > It's the s390 only zfcp device driver. > > FWIW, yet another use-after-free crash, this time however in multipath_end_io: > > [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > [96875.870602] Oops: 0038 [#1] > [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > st unloaded: scsi_wait_scan] > [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) > [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > [96875.870889] Call Trace: > [96875.870892] ([<0700000000000008>] 0x700000000000008) > [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > [96875.870953] [<000000000061a164>] io_return+0x0/0x16 > [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > [96875.871028] INFO: lockdep is turned off. > [96875.871031] Last Breaking-Event-Address: > [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] > > static int multipath_end_io(struct dm_target *ti, struct request *clone, > int error, union map_info *map_context) > { > struct multipath *m = ti->private; > struct dm_mpath_io *mpio = map_context->ptr; > struct pgpath *pgpath = mpio->pgpath; > struct path_selector *ps; > int r; > > r = do_end_io(m, clone, error, mpio); > if (pgpath) { > ps = &pgpath->pg->ps; <--- crashes here > if (ps->type->end_io) > ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); > } > mempool_free(mpio, m->mpio_pool); > > return r; > } > > It crashes when trying to derefence pgpath, which was freed. Since we have > SLUB debugging turned on the freed object tells us that it was allocated > via a call to multipath_ctr() and freed via a call to free_priority_group(). struct pgpath is freed before dm_target when tearing down dm table. So if the problematic completion was being done after freeing pgpath but before freeing dm_target, crash would look like that and what's happening seems the same for these dm crashes: dm table was somehow destroyed while I/O was in-flight. It's interesting that your test started to crash in dm with v3.0.7. Have you gotten these dm crashes with v3.0.6 or before? Have you hit the initially-reported scsi oops with v3.0.7? Are your v3.0.6 and v3.0.7 compiled with same config and the tests ran on same system? > > FWIW, this reproduction was done with barriers off. > > Btw, since I cc'ed you rather late I'm not sure if you are aware of the > test scenario: it's I/O stress with multipathing were paths come and go > all the time. It usually takes quite a few hours before the crashes are > observed. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-07 11:30 ` Jun'ichi Nomura @ 2011-11-07 15:36 ` Mike Snitzer 2011-11-07 16:43 ` Heiko Carstens 2011-11-07 17:10 ` Mike Snitzer 0 siblings, 2 replies; 61+ messages in thread From: Mike Snitzer @ 2011-11-07 15:36 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Heiko Carstens, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Mon, Nov 07 2011 at 6:30am -0500, Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > On 11/04/11 18:19, Heiko Carstens wrote: > > On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: > >> On Mon, Oct 31 2011 at 9:00am -0400, > >> Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > >> > >>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: > >>>> Hm, dm_softirq_done is generic completion code of original > >>>> request in dm-multipath. > >>>> So oops here might be another manifestation of use-after-free. > >>>> > >>>> Do you always hit the oops at the same address? > >>> > >>> I think we saw this bug the first time. But before that the scsi > >>> logging level was higher. Gonzalo is trying to recreate it with > >>> the same (old) scsi logging level. > >>> Afterwards we will try with barrier=0. > >>> > >>> Both on v3.0.7 btw. > >>> > >>>> Could you find corresponding source code line for > >>>> the crashed address, dm_softirq_done+0x72/0x140, > >>>> and which pointer was invalid? > >>> > >>> It crashes in the inlined function dm_done() when trying to > >>> dereference tio (aka clone->end_io_data): > >>> > >>> static void dm_done(struct request *clone, int error, bool mapped) > >>> { > >>> int r = error; > >>> struct dm_rq_target_io *tio = clone->end_io_data; > >>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; > >> > >> Hi, > >> > >> Which underlying storage driver is being used by this multipath device? > > > > It's the s390 only zfcp device driver. > > > > FWIW, yet another use-after-free crash, this time however in multipath_end_io: > > > > [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > > [96875.870602] Oops: 0038 [#1] > > [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > > [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > > st unloaded: scsi_wait_scan] > > [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > > [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > > [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > > [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > > [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > > [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > > [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > > [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > > [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > > [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > > [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > > [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) > > [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > > [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > > [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > > [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > > [96875.870889] Call Trace: > > [96875.870892] ([<0700000000000008>] 0x700000000000008) > > [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > > [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > > [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > > [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > > [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > > [96875.870953] [<000000000061a164>] io_return+0x0/0x16 > > [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > > [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > > [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > > [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > > [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > > [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > > [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > > [96875.871028] INFO: lockdep is turned off. > > [96875.871031] Last Breaking-Event-Address: > > [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] > > > > static int multipath_end_io(struct dm_target *ti, struct request *clone, > > int error, union map_info *map_context) > > { > > struct multipath *m = ti->private; > > struct dm_mpath_io *mpio = map_context->ptr; > > struct pgpath *pgpath = mpio->pgpath; > > struct path_selector *ps; > > int r; > > > > r = do_end_io(m, clone, error, mpio); > > if (pgpath) { > > ps = &pgpath->pg->ps; <--- crashes here > > if (ps->type->end_io) > > ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); > > } > > mempool_free(mpio, m->mpio_pool); > > > > return r; > > } > > > > It crashes when trying to derefence pgpath, which was freed. Since we have > > SLUB debugging turned on the freed object tells us that it was allocated > > via a call to multipath_ctr() and freed via a call to free_priority_group(). > > struct pgpath is freed before dm_target when tearing down dm table. > So if the problematic completion was being done after freeing pgpath > but before freeing dm_target, crash would look like that > and what's happening seems the same for these dm crashes: > dm table was somehow destroyed while I/O was in-flight. Could be the block layer's onstack plugging changes are at the heart of this. I voiced onstack plugging concerns relative to DM some time ago (https://lkml.org/lkml/2011/3/9/450) but somehow convinced myself DM was fine to no longer need dm_table_unplug_all() etc. Unfortunately I cannot recall _why_ I felt that was the case. So DM needs further review relative to block's onstack plugging changes and DM IO completion. > It's interesting that your test started to crash in dm with v3.0.7. > Have you gotten these dm crashes with v3.0.6 or before? > Have you hit the initially-reported scsi oops with v3.0.7? > Are your v3.0.6 and v3.0.7 compiled with same config and the tests > ran on same system? If all 3.0.x fail: it would be interesting to know if 2.6.39 (which introduced the onstack plugging) also has these problems. Testing with 2.6.38 would be very insightful because it obviously doesn't have any of the onstack plugging churn. Mike ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-07 15:36 ` Mike Snitzer @ 2011-11-07 16:43 ` Heiko Carstens 2011-11-07 17:10 ` Mike Snitzer 1 sibling, 0 replies; 61+ messages in thread From: Heiko Carstens @ 2011-11-07 16:43 UTC (permalink / raw) To: Mike Snitzer Cc: Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Mon, Nov 07, 2011 at 10:36:49AM -0500, Mike Snitzer wrote: > > struct pgpath is freed before dm_target when tearing down dm table. > > So if the problematic completion was being done after freeing pgpath > > but before freeing dm_target, crash would look like that > > and what's happening seems the same for these dm crashes: > > dm table was somehow destroyed while I/O was in-flight. > > Could be the block layer's onstack plugging changes are at the heart of > this. > > I voiced onstack plugging concerns relative to DM some time ago > (https://lkml.org/lkml/2011/3/9/450) but somehow convinced myself DM was > fine to no longer need dm_table_unplug_all() etc. Unfortunately I > cannot recall _why_ I felt that was the case. > > So DM needs further review relative to block's onstack plugging changes > and DM IO completion. > > > It's interesting that your test started to crash in dm with v3.0.7. > > Have you gotten these dm crashes with v3.0.6 or before? > > Have you hit the initially-reported scsi oops with v3.0.7? > > Are your v3.0.6 and v3.0.7 compiled with same config and the tests > > ran on same system? > > If all 3.0.x fail: it would be interesting to know if 2.6.39 (which > introduced the onstack plugging) also has these problems. At least 2.6.39.1 failed as well, as reported already in June. At least the crash looked similar back then: http://marc.info/?l=linux-scsi&m=130824013229933&w=2 > Testing with > 2.6.38 would be very insightful because it obviously doesn't have any of > the onstack plugging churn. Ok, we might try 2.6.38.8 (latest stable) and see what happens. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-07 15:36 ` Mike Snitzer 2011-11-07 16:43 ` Heiko Carstens @ 2011-11-07 17:10 ` Mike Snitzer 2011-11-07 21:44 ` Mike Snitzer 1 sibling, 1 reply; 61+ messages in thread From: Mike Snitzer @ 2011-11-07 17:10 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Heiko Carstens, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas, dm-devel On Mon, Nov 07 2011 at 10:36am -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Nov 07 2011 at 6:30am -0500, > Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > > > On 11/04/11 18:19, Heiko Carstens wrote: > > > It's the s390 only zfcp device driver. > > > > > > FWIW, yet another use-after-free crash, this time however in multipath_end_io: > > > > > > [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > > > [96875.870602] Oops: 0038 [#1] > > > [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > > > [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > > > st unloaded: scsi_wait_scan] > > > [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > > > [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > > > [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > > > [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > > > [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > > > [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > > > [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > > > [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > > > [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > > > [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > > > [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > > > [96875.870809] >000003e001347138: e31020180004 lg %r1,24(%r2) > > > [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > > > [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > > > [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > > > [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > > > [96875.870889] Call Trace: > > > [96875.870892] ([<0700000000000008>] 0x700000000000008) > > > [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > > > [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > > [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > > > [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > > > [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > > > [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > > > [96875.870953] [<000000000061a164>] io_return+0x0/0x16 > > > [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > > > [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > > > [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > > > [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > > > [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > > > [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > > > [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > > > [96875.871028] INFO: lockdep is turned off. > > > [96875.871031] Last Breaking-Event-Address: > > > [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] > > > > > > static int multipath_end_io(struct dm_target *ti, struct request *clone, > > > int error, union map_info *map_context) > > > { > > > struct multipath *m = ti->private; > > > struct dm_mpath_io *mpio = map_context->ptr; > > > struct pgpath *pgpath = mpio->pgpath; > > > struct path_selector *ps; > > > int r; > > > > > > r = do_end_io(m, clone, error, mpio); > > > if (pgpath) { > > > ps = &pgpath->pg->ps; <--- crashes here > > > if (ps->type->end_io) > > > ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); > > > } > > > mempool_free(mpio, m->mpio_pool); > > > > > > return r; > > > } > > > > > > It crashes when trying to derefence pgpath, which was freed. Since we have > > > SLUB debugging turned on the freed object tells us that it was allocated > > > via a call to multipath_ctr() and freed via a call to free_priority_group(). > > > > struct pgpath is freed before dm_target when tearing down dm table. > > So if the problematic completion was being done after freeing pgpath > > but before freeing dm_target, crash would look like that > > and what's happening seems the same for these dm crashes: > > dm table was somehow destroyed while I/O was in-flight. > > Could be the block layer's onstack plugging changes are at the heart of > this. > > I voiced onstack plugging concerns relative to DM some time ago > (https://lkml.org/lkml/2011/3/9/450) but somehow convinced myself DM was > fine to no longer need dm_table_unplug_all() etc. Unfortunately I > cannot recall _why_ I felt that was the case. > > So DM needs further review relative to block's onstack plugging changes > and DM IO completion. dm_suspend is performed as part of the DM table reload that is being done my multipathd during path failure. Seems DM no longer insures inflight requests have finished during dm_suspend(). Before onstack plugging (< 2.6.39): dm_suspend() -> dm_wait_for_completion() -> dm_unplug_all() -> dm_table_unplug_all() After onstack plugging (>= 2.6.39, commit 7eaceaccab5f40bb): dm_suspend's call to dm_wait_for_completion() no longer unplugs IO (dm_unplug_all and dm_table_unplug_all were removed without introducing a clear equivalent). Mike ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-07 17:10 ` Mike Snitzer @ 2011-11-07 21:44 ` Mike Snitzer 0 siblings, 0 replies; 61+ messages in thread From: Mike Snitzer @ 2011-11-07 21:44 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Heiko Carstens, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas, dm-devel On Mon, Nov 07 2011 at 12:10pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Nov 07 2011 at 10:36am -0500, > Mike Snitzer <snitzer@redhat.com> wrote: > > Could be the block layer's onstack plugging changes are at the heart of > > this. > > > > I voiced onstack plugging concerns relative to DM some time ago > > (https://lkml.org/lkml/2011/3/9/450) but somehow convinced myself DM was > > fine to no longer need dm_table_unplug_all() etc. Unfortunately I > > cannot recall _why_ I felt that was the case. > > > > So DM needs further review relative to block's onstack plugging changes > > and DM IO completion. > > dm_suspend is performed as part of the DM table reload that is being > done my multipathd during path failure. Seems DM no longer insures > inflight requests have finished during dm_suspend(). > > Before onstack plugging (< 2.6.39): > dm_suspend() -> dm_wait_for_completion() -> dm_unplug_all() -> dm_table_unplug_all() > > After onstack plugging (>= 2.6.39, commit 7eaceaccab5f40bb): > dm_suspend's call to dm_wait_for_completion() no longer unplugs IO > (dm_unplug_all and dm_table_unplug_all were removed without introducing > a clear equivalent). If a missing unplug were a concern then dm_wait_for_completion()'s check for inflight IO would cause dm_suspend() to hang. So the onstack plugging changes are unlikely to be the reason for this. Mike ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-04 9:19 ` Heiko Carstens 2011-11-04 13:30 ` Mike Snitzer 2011-11-07 11:30 ` Jun'ichi Nomura @ 2011-11-09 9:37 ` Hannes Reinecke 2011-11-10 16:10 ` Heiko Carstens 2 siblings, 1 reply; 61+ messages in thread From: Hannes Reinecke @ 2011-11-09 9:37 UTC (permalink / raw) To: Heiko Carstens Cc: Mike Snitzer, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On 11/04/2011 10:19 AM, Heiko Carstens wrote: > On Thu, Nov 03, 2011 at 02:25:48PM -0400, Mike Snitzer wrote: >> On Mon, Oct 31 2011 at 9:00am -0400, >> Heiko Carstens<heiko.carstens@de.ibm.com> wrote: >> >>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >>>> Hm, dm_softirq_done is generic completion code of original >>>> request in dm-multipath. >>>> So oops here might be another manifestation of use-after-free. >>>> >>>> Do you always hit the oops at the same address? >>> >>> I think we saw this bug the first time. But before that the scsi >>> logging level was higher. Gonzalo is trying to recreate it with >>> the same (old) scsi logging level. >>> Afterwards we will try with barrier=0. >>> >>> Both on v3.0.7 btw. >>> >>>> Could you find corresponding source code line for >>>> the crashed address, dm_softirq_done+0x72/0x140, >>>> and which pointer was invalid? >>> >>> It crashes in the inlined function dm_done() when trying to >>> dereference tio (aka clone->end_io_data): >>> >>> static void dm_done(struct request *clone, int error, bool mapped) >>> { >>> int r = error; >>> struct dm_rq_target_io *tio = clone->end_io_data; >>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; >> >> Hi, >> >> Which underlying storage driver is being used by this multipath device? > > It's the s390 only zfcp device driver. > > FWIW, yet another use-after-free crash, this time however in multipath_end_io: > > [96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > [96875.870602] Oops: 0038 [#1] > [96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > [96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > st unloaded: scsi_wait_scan] > [96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > [96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > [96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > [96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > [96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > [96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > [96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > [96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > [96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > [96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > [96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > [96875.870809]>000003e001347138: e31020180004 lg %r1,24(%r2) > [96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > [96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > [96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > [96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > [96875.870889] Call Trace: > [96875.870892] ([<0700000000000008>] 0x700000000000008) > [96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > [96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > [96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > [96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > [96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > [96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > [96875.870953] [<000000000061a164>] io_return+0x0/0x16 > [96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > [96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > [96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > [96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > [96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > [96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > [96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > [96875.871028] INFO: lockdep is turned off. > [96875.871031] Last Breaking-Event-Address: > [96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] > > static int multipath_end_io(struct dm_target *ti, struct request *clone, > int error, union map_info *map_context) > { > struct multipath *m = ti->private; > struct dm_mpath_io *mpio = map_context->ptr; > struct pgpath *pgpath = mpio->pgpath; > struct path_selector *ps; > int r; > > r = do_end_io(m, clone, error, mpio); > if (pgpath) { > ps =&pgpath->pg->ps;<--- crashes here > if (ps->type->end_io) > ps->type->end_io(ps,&pgpath->path, mpio->nr_bytes); > } > mempool_free(mpio, m->mpio_pool); > > return r; > } > > It crashes when trying to derefence pgpath, which was freed. Since we have > SLUB debugging turned on the freed object tells us that it was allocated > via a call to multipath_ctr() and freed via a call to free_priority_group(). > > FWIW, this reproduction was done with barriers off. > > Btw, since I cc'ed you rather late I'm not sure if you are aware of the > test scenario: it's I/O stress with multipathing were paths come and go > all the time. It usually takes quite a few hours before the crashes are > observed. Hmm. Just to be on the safe side, could you try this one: diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 5e0090e..e6fad46 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -920,8 +920,10 @@ static int multipath_map(struct dm_target *ti, struct reque st *clone, map_context->ptr = mpio; clone->cmd_flags |= REQ_FAILFAST_TRANSPORT; r = map_io(m, clone, mpio, 0); - if (r < 0 || r == DM_MAPIO_REQUEUE) + if (r < 0 || r == DM_MAPIO_REQUEUE) { mempool_free(mpio, m->mpio_pool); + map_context->ptr = NULL; + } return r; } Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-09 9:37 ` Hannes Reinecke @ 2011-11-10 16:10 ` Heiko Carstens 2011-11-17 16:29 ` Mike Snitzer 0 siblings, 1 reply; 61+ messages in thread From: Heiko Carstens @ 2011-11-10 16:10 UTC (permalink / raw) To: Hannes Reinecke Cc: Mike Snitzer, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Wed, Nov 09, 2011 at 10:37:06AM +0100, Hannes Reinecke wrote: > >FWIW, yet another use-after-free crash, this time however in multipath_end_io: > > > >[96875.870593] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > >[96875.870602] Oops: 0038 [#1] > >[96875.870674] PREEMPT SMP DEBUG_PAGEALLOC > >[96875.870683] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [la\ > >st unloaded: scsi_wait_scan] > >[96875.870722] CPU: 2 Tainted: G W 3.0.7-50.x.20111024-s390xdefault #1 > >[96875.870728] Process udevd (pid: 36697, task: 0000000072c8a3a8, ksp: 0000000057c43868) > >[96875.870732] Krnl PSW : 0704200180000000 000003e001347138 (multipath_end_io+0x50/0x140 [dm_multipath]) > >[96875.870746] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > >[96875.870751] Krnl GPRS: 0000000000000000 000003e000000000 6b6b6b6b6b6b6b6b 00000000717ab940 > >[96875.870755] 0000000000000000 00000000717abab0 0000000000000002 0700000000000008 > >[96875.870759] 0000000000000002 0000000000000000 0000000058dd37a8 000000006f845478 > >[96875.870764] 000003e0012e1000 000000005613d1f0 000000007a737bf0 000000007a737ba0 > >[96875.870768] Krnl Code: 000003e00134712a: b90200dd ltgr %r13,%r13 > >[96875.870793] 000003e00134712e: a7840017 brc 8,3e00134715c > >[96875.870800] 000003e001347132: e320d0100004 lg %r2,16(%r13) > >[96875.870809]>000003e001347138: e31020180004 lg %r1,24(%r2) > >[96875.870818] 000003e00134713e: e31010580004 lg %r1,88(%r1) > >[96875.870827] 000003e001347144: b9020011 ltgr %r1,%r1 > >[96875.870835] 000003e001347148: a784000a brc 8,3e00134715c > >[96875.870841] 000003e00134714c: 41202018 la %r2,24(%r2) > >[96875.870889] Call Trace: > >[96875.870892] ([<0700000000000008>] 0x700000000000008) > >[96875.870897] [<000003e0012e3662>] dm_softirq_done+0x9a/0x140 [dm_mod] > >[96875.870915] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > >[96875.870925] [<00000000001587c2>] __do_softirq+0xda/0x398 > >[96875.870932] [<000000000010f47e>] do_softirq+0xe2/0xe8 > >[96875.870940] [<0000000000158e2c>] irq_exit+0xc8/0xcc > >[96875.870945] [<00000000004ceb48>] do_IRQ+0x910/0x1bfc > >[96875.870953] [<000000000061a164>] io_return+0x0/0x16 > >[96875.870961] [<000000000019c84e>] lock_acquire+0xd2/0x204 > >[96875.870969] ([<000000000019c836>] lock_acquire+0xba/0x204) > >[96875.870974] [<0000000000615f8e>] mutex_lock_killable_nested+0x92/0x520 > >[96875.870983] [<0000000000292796>] vfs_readdir+0x8a/0xe4 > >[96875.870992] [<00000000002928e0>] SyS_getdents+0x60/0xe8 > >[96875.870999] [<0000000000619af2>] sysc_noemu+0x16/0x1c > >[96875.871024] [<000003fffd1ec83e>] 0x3fffd1ec83e > >[96875.871028] INFO: lockdep is turned off. > >[96875.871031] Last Breaking-Event-Address: > >[96875.871037] [<000003e0012e3660>] dm_softirq_done+0x98/0x140 [dm_mod] [...] > Hmm. Just to be on the safe side, could you try this one: > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > index 5e0090e..e6fad46 100644 > --- a/drivers/md/dm-mpath.c > +++ b/drivers/md/dm-mpath.c > @@ -920,8 +920,10 @@ static int multipath_map(struct dm_target *ti, > struct reque > st *clone, > map_context->ptr = mpio; > clone->cmd_flags |= REQ_FAILFAST_TRANSPORT; > r = map_io(m, clone, mpio, 0); > - if (r < 0 || r == DM_MAPIO_REQUEUE) > + if (r < 0 || r == DM_MAPIO_REQUEUE) { > mempool_free(mpio, m->mpio_pool); > + map_context->ptr = NULL; > + } > > return r; > } With your patch we haven't been able to reproduce the kernel crash until now. Now we "only" run into I/O stalls, which before your patch we also did. But repeatedly rebooting and retrying and ignoring the I/O stalls always lead to a crash. Gonzalo will run a couple of extra rounds so we can have a feeling if at least one of the bugs could be fixed with your patch ;) ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-10 16:10 ` Heiko Carstens @ 2011-11-17 16:29 ` Mike Snitzer 2011-11-29 12:00 ` Heiko Carstens 0 siblings, 1 reply; 61+ messages in thread From: Mike Snitzer @ 2011-11-17 16:29 UTC (permalink / raw) To: Heiko Carstens Cc: Hannes Reinecke, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Thu, Nov 10 2011 at 11:10am -0500, Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Wed, Nov 09, 2011 at 10:37:06AM +0100, Hannes Reinecke wrote: ... > > Hmm. Just to be on the safe side, could you try this one: > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > index 5e0090e..e6fad46 100644 > > --- a/drivers/md/dm-mpath.c > > +++ b/drivers/md/dm-mpath.c > > @@ -920,8 +920,10 @@ static int multipath_map(struct dm_target *ti, > > struct reque > > st *clone, > > map_context->ptr = mpio; > > clone->cmd_flags |= REQ_FAILFAST_TRANSPORT; > > r = map_io(m, clone, mpio, 0); > > - if (r < 0 || r == DM_MAPIO_REQUEUE) > > + if (r < 0 || r == DM_MAPIO_REQUEUE) { > > mempool_free(mpio, m->mpio_pool); > > + map_context->ptr = NULL; > > + } > > > > return r; > > } > > With your patch we haven't been able to reproduce the kernel crash until now. > Now we "only" run into I/O stalls, which before your patch we also did. But > repeatedly rebooting and retrying and ignoring the I/O stalls always lead to > a crash. > Gonzalo will run a couple of extra rounds so we can have a feeling if at least > one of the bugs could be fixed with your patch ;) Hi, Any update after further testing with Hannes' patch? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-17 16:29 ` Mike Snitzer @ 2011-11-29 12:00 ` Heiko Carstens 2011-11-29 20:18 ` Mike Snitzer 0 siblings, 1 reply; 61+ messages in thread From: Heiko Carstens @ 2011-11-29 12:00 UTC (permalink / raw) To: Mike Snitzer Cc: Hannes Reinecke, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas > > > Hmm. Just to be on the safe side, could you try this one: > > > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > > index 5e0090e..e6fad46 100644 > > > --- a/drivers/md/dm-mpath.c > > > +++ b/drivers/md/dm-mpath.c > > > @@ -920,8 +920,10 @@ static int multipath_map(struct dm_target *ti, > > > struct reque > > > st *clone, > > > map_context->ptr = mpio; > > > clone->cmd_flags |= REQ_FAILFAST_TRANSPORT; > > > r = map_io(m, clone, mpio, 0); > > > - if (r < 0 || r == DM_MAPIO_REQUEUE) > > > + if (r < 0 || r == DM_MAPIO_REQUEUE) { > > > mempool_free(mpio, m->mpio_pool); > > > + map_context->ptr = NULL; > > > + } > > > > > > return r; > > > } > > > > With your patch we haven't been able to reproduce the kernel crash until now. > > Now we "only" run into I/O stalls, which before your patch we also did. But > > repeatedly rebooting and retrying and ignoring the I/O stalls always lead to > > a crash. > > Gonzalo will run a couple of extra rounds so we can have a feeling if at least > > one of the bugs could be fixed with your patch ;) > > Hi, > > Any update after further testing with Hannes' patch? Sorry for the late update, our internal IBM IMAP servers have been down for nearly a week :/ So, we were unable to reproduce the original bug with the patch applied during various runs. However, we ran into this one instead, which is yet another use-after-free bug (I need to double check, but I'm quite sure that a freed struct scsi_cmnd caused this). [ 4906.683654] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 [ 4906.683662] Oops: 0038 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 4906.683672] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [last unloaded: scsi_wait_scan] [ 4906.683696] CPU: 3 Not tainted 3.1.0-52.x.20111111-s390xdefault #1 [ 4906.683700] Process flush-252:12 (pid: 2489, task: 0000000072b4a490, ksp: 0000000072f8fb48) [ 4906.683705] Krnl PSW : 0404200180000000 000000000052a98c (zfcp_fsf_fcp_handler_common+0x3c/0x2f4) [ 4906.683719] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 [ 4906.683728] Krnl GPRS: 0000000000000000 00000000726dc800 0000000037e1c4e8 0400043100d78e40 [ 4906.683733] 0000000070ccc000 0000000000000010 0700000074b4dcd0 00000000726dc800 [ 4906.683738] 0000000037e1c4e8 070000000d427960 0000000074b4dcd0 0000000070ccc000 [ 4906.683743] 6b6b6b6b6b6b6b6b 0000000000688560 000000000d427980 000000000d427920 [ 4906.683761] Krnl Code: 000000000052a97c: 58502090 l %r5,144(%r2) [ 4906.683767] 000000000052a980: e3c010000004 lg %r12,0(%r1) [ 4906.683773] 000000000052a986: e34020980004 lg %r4,152(%r2) [ 4906.683780] >000000000052a98c: e330c0000004 lg %r3,0(%r12) [ 4906.683786] 000000000052a992: a7510008 tmll %r5,8 [ 4906.683792] 000000000052a996: e33032080004 lg %r3,520(%r3) [ 4906.683798] 000000000052a99c: 58303204 l %r3,516(%r3) [ 4906.683803] 000000000052a9a0: a774001c brc 7,52a9d8 [ 4906.683809] Call Trace: [ 4906.683811] ([<000000000d427980>] 0xd427980) [ 4906.683817] [<000000000052aff2>] zfcp_fsf_fcp_cmnd_handler+0x52/0x448 [ 4906.683824] [<000000000052c3f8>] zfcp_fsf_req_complete+0x1d8/0x7e4 [ 4906.683829] [<000000000052ef2c>] zfcp_fsf_reqid_check+0xc4/0x13c [ 4906.683835] [<000000000052fe92>] zfcp_qdio_int_resp+0x72/0x1a4 [ 4906.683841] [<00000000004eb6fe>] qdio_kick_handler+0x12e/0x2e0 [ 4906.683848] [<00000000004ecfb2>] __tiqdio_inbound_processing+0xea/0xd98 [ 4906.683854] [<00000000001552f2>] tasklet_action+0xd2/0x29c [ 4906.683862] [<00000000001563e2>] __do_softirq+0xda/0x398 [ 4906.683868] [<000000000010f47e>] do_softirq+0xe2/0xe8 [ 4906.683876] [<0000000000156a4c>] irq_exit+0xc8/0xcc [ 4906.683881] [<00000000004d79fa>] do_IRQ+0x20e/0x320 [ 4906.683889] [<000000000061de8c>] io_return+0x0/0x16 [ 4906.683897] [<000000000061cf78>] _raw_spin_unlock_irqrestore+0x98/0xa8 [ 4906.683904] ([<000000000061cf6e>] _raw_spin_unlock_irqrestore+0x8e/0xa8) [ 4906.683910] [<0000000000218262>] test_set_page_writeback+0x10e/0x248 [ 4906.683919] [<00000000002b9254>] __block_write_full_page+0x310/0x5cc [ 4906.683926] [<00000000002b9628>] block_write_full_page_endio+0x118/0x168 [ 4906.683932] [<000000000031050e>] ext3_writeback_writepage+0x1fa/0x28c [ 4906.683940] [<0000000000218006>] __writepage+0x2e/0x88 [ 4906.683945] [<0000000000218be0>] write_cache_pages+0x224/0x600 [ 4906.683951] [<000000000021901c>] generic_writepages+0x60/0x94 [ 4906.683957] [<00000000002ace14>] writeback_single_inode+0x13c/0x53c [ 4906.683964] [<00000000002adb80>] writeback_sb_inodes+0x1d4/0x2e4 [ 4906.683970] [<00000000002ae44c>] __writeback_inodes_wb+0xa0/0xec [ 4906.683976] [<00000000002ae926>] wb_writeback+0x48e/0x5f8 [ 4906.683981] [<00000000002af03a>] wb_do_writeback+0x302/0x3ac [ 4906.683987] [<00000000002af194>] bdi_writeback_thread+0xb0/0x4e0 [ 4906.683993] [<000000000017a3ea>] kthread+0xa6/0xb0 [ 4906.683999] [<000000000061d436>] kernel_thread_starter+0x6/0xc [ 4906.684005] [<000000000061d430>] kernel_thread_starter+0x0/0xc [ 4906.684010] INFO: lockdep is turned off. [ 4906.684013] Last Breaking-Event-Address: [ 4906.684016] [<000000000052afec>] zfcp_fsf_fcp_cmnd_handler+0x4c/0x448 Gonzalo also tried 2.6.38.8 as suggested and ran into this one: [ 292.877936] ------------[ cut here ]------------ [ 292.877939] Kernel BUG at 6b6b6b6b6b6b6b6d [verbose debug info unavailable] [ 292.877947] specification exception: 0006 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 292.877956] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [last unloaded: scsi_wait_scan] [ 292.877979] CPU: 1 Not tainted 2.6.38.8 #1 [ 292.877982] Process multipathd (pid: 352, task: 000000007bab8000, ksp: 000000007ba3ba00) [ 292.877988] Krnl PSW : 0704000180000000 6b6b6b6b6b6b6b6d (0x6b6b6b6b6b6b6b6d) [ 292.877997] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 [ 292.878003] Krnl GPRS: 17c0000000000000 6b6b6b6b6b6b6b6b 0000000078dc49f0 0000000000000000 [ 292.878008] 000003c001f6a728 00000000005ec230 00000000738e2910 00000000756d4aa0 [ 292.878013] 000003c000000001 000000007ba3bc58 00000000738e2910 00000000738e2a08 [ 292.878018] 000003c001f63000 0000000078dc49f0 00000000003e6c0a 000000007ba3bb80 [ 292.878024] Krnl Code: Bad PSW. [ 292.878027] Call Trace: [ 292.878030] ([<00000000003e6c0a>] blk_unplug+0x42/0x150) [ 292.878040] [<000003c001f6a728>] dm_table_unplug_all+0x60/0x10c [dm_mod] [ 292.878060] [<000003c001f65926>] dm_unplug_all+0x86/0xa8 [dm_mod] [ 292.878069] [<000003c001f68508>] dm_suspend+0x1a4/0x394 [dm_mod] [ 292.878078] [<000003c001f6dce6>] dev_suspend+0x21e/0x250 [dm_mod] [ 292.878087] [<000003c001f6eaa8>] ctl_ioctl+0x1c8/0x28c [dm_mod] [ 292.878096] [<000003c001f6eb96>] dm_ctl_ioctl+0x2a/0x38 [dm_mod] [ 292.878105] [<000000000027df74>] do_vfs_ioctl+0x94/0x5b8 [ 292.878112] [<000000000027e52c>] SyS_ioctl+0x94/0xac [ 292.878117] [<00000000005d8f5e>] sysc_noemu+0x16/0x1c [ 292.878125] [<000003fffd2097ca>] 0x3fffd2097ca [ 292.878130] INFO: lockdep is turned off. [ 292.878133] Last Breaking-Event-Address: [ 292.878136] [<00000000003e6c08>] blk_unplug+0x40/0x150 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-29 12:00 ` Heiko Carstens @ 2011-11-29 20:18 ` Mike Snitzer 2011-11-30 7:25 ` Hannes Reinecke 2011-12-12 12:39 ` Heiko Carstens 0 siblings, 2 replies; 61+ messages in thread From: Mike Snitzer @ 2011-11-29 20:18 UTC (permalink / raw) To: Heiko Carstens Cc: Hannes Reinecke, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas, dm-devel On Tue, Nov 29 2011 at 7:00am -0500, Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > > Hmm. Just to be on the safe side, could you try this one: > > > > > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > > > index 5e0090e..e6fad46 100644 > > > > --- a/drivers/md/dm-mpath.c > > > > +++ b/drivers/md/dm-mpath.c > > > > @@ -920,8 +920,10 @@ static int multipath_map(struct dm_target *ti, > > > > struct reque > > > > st *clone, > > > > map_context->ptr = mpio; > > > > clone->cmd_flags |= REQ_FAILFAST_TRANSPORT; > > > > r = map_io(m, clone, mpio, 0); > > > > - if (r < 0 || r == DM_MAPIO_REQUEUE) > > > > + if (r < 0 || r == DM_MAPIO_REQUEUE) { > > > > mempool_free(mpio, m->mpio_pool); > > > > + map_context->ptr = NULL; > > > > + } > > > > > > > > return r; > > > > } > > > > > > With your patch we haven't been able to reproduce the kernel crash until now. > > > Now we "only" run into I/O stalls, which before your patch we also did. But > > > repeatedly rebooting and retrying and ignoring the I/O stalls always lead to > > > a crash. > > > Gonzalo will run a couple of extra rounds so we can have a feeling if at least > > > one of the bugs could be fixed with your patch ;) > > > > Hi, > > > > Any update after further testing with Hannes' patch? > > Sorry for the late update, our internal IBM IMAP servers have been down > for nearly a week :/ > > So, we were unable to reproduce the original bug with the patch applied > during various runs. OK, so it seems to be a benenficial change (and obviously correct to me). Hannes, care to formally post your fix to dm-devel so we can get it in 3.2-rc? > However, we ran into this one instead, which is yet another use-after-free bug > (I need to double check, but I'm quite sure that a freed struct scsi_cmnd > caused this). OK, yeah something is causing poisoned (POISON_FREE) memory to be used. > [ 4906.683654] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 ... > Gonzalo also tried 2.6.38.8 as suggested and ran into this one: > > [ 292.877936] ------------[ cut here ]------------ > [ 292.877939] Kernel BUG at 6b6b6b6b6b6b6b6d [verbose debug info unavailable] Again, more poison. Seems this test is causing us to fall on our face no matter what. Likely, best to leave this 2.6.38 blk_unplug crash to one side and continue focusing on latest upstream. Mike ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-29 20:18 ` Mike Snitzer @ 2011-11-30 7:25 ` Hannes Reinecke 2011-12-12 12:39 ` Heiko Carstens 1 sibling, 0 replies; 61+ messages in thread From: Hannes Reinecke @ 2011-11-30 7:25 UTC (permalink / raw) To: Mike Snitzer Cc: Heiko Carstens, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas, dm-devel On 11/29/2011 09:18 PM, Mike Snitzer wrote: > On Tue, Nov 29 2011 at 7:00am -0500, > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > >>>>> Hmm. Just to be on the safe side, could you try this one: >>>>> >>>>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c >>>>> index 5e0090e..e6fad46 100644 >>>>> --- a/drivers/md/dm-mpath.c >>>>> +++ b/drivers/md/dm-mpath.c >>>>> @@ -920,8 +920,10 @@ static int multipath_map(struct dm_target *ti, >>>>> struct reque >>>>> st *clone, >>>>> map_context->ptr = mpio; >>>>> clone->cmd_flags |= REQ_FAILFAST_TRANSPORT; >>>>> r = map_io(m, clone, mpio, 0); >>>>> - if (r < 0 || r == DM_MAPIO_REQUEUE) >>>>> + if (r < 0 || r == DM_MAPIO_REQUEUE) { >>>>> mempool_free(mpio, m->mpio_pool); >>>>> + map_context->ptr = NULL; >>>>> + } >>>>> >>>>> return r; >>>>> } >>>> >>>> With your patch we haven't been able to reproduce the kernel crash until now. >>>> Now we "only" run into I/O stalls, which before your patch we also did. But >>>> repeatedly rebooting and retrying and ignoring the I/O stalls always lead to >>>> a crash. >>>> Gonzalo will run a couple of extra rounds so we can have a feeling if at least >>>> one of the bugs could be fixed with your patch ;) >>> >>> Hi, >>> >>> Any update after further testing with Hannes' patch? >> >> Sorry for the late update, our internal IBM IMAP servers have been down >> for nearly a week :/ >> >> So, we were unable to reproduce the original bug with the patch applied >> during various runs. > > OK, so it seems to be a benenficial change (and obviously correct to > me). Hannes, care to formally post your fix to dm-devel so we can get > it in 3.2-rc? > Yep, will do. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-11-29 20:18 ` Mike Snitzer 2011-11-30 7:25 ` Hannes Reinecke @ 2011-12-12 12:39 ` Heiko Carstens 2011-12-13 16:50 ` Mike Snitzer 1 sibling, 1 reply; 61+ messages in thread From: Heiko Carstens @ 2011-12-12 12:39 UTC (permalink / raw) To: Mike Snitzer Cc: Hannes Reinecke, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas, dm-devel On Tue, Nov 29, 2011 at 03:18:03PM -0500, Mike Snitzer wrote: > On Tue, Nov 29 2011 at 7:00am -0500, > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > [ 4906.683654] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > > ... > > > Gonzalo also tried 2.6.38.8 as suggested and ran into this one: > > > > [ 292.877936] ------------[ cut here ]------------ > > [ 292.877939] Kernel BUG at 6b6b6b6b6b6b6b6d [verbose debug info unavailable] > > Again, more poison. > > Seems this test is causing us to fall on our face no matter what. > Likely, best to leave this 2.6.38 blk_unplug crash to one side and > continue focusing on latest upstream. Sorry again, for taking so long to come back. This time however with good news: With 3.2.0-rc4.00255.g77a7300 we were unable to reproduce any I/O stall or user-after-free bugs even after nearly 3000 test iterations. The only patches on top we have are: two patches from Hannes: http://www.spinics.net/lists/linux-scsi/msg55112.html http://www.spinics.net/lists/linux-scsi/msg55413.html and the patch below from Steffen: Btw. James, any chance to get this one upstream soon? It should be in your queue for quite some time already, IIRC. Subject: [PATCH] zfcp: return early from slave_destroy if slave_alloc returned early From: Steffen Maier <maier@linux.vnet.ibm.com> zfcp_scsi_slave_destroy erroneously always tried to finish its task even if the corresponding previous zfcp_scsi_slave_alloc returned early. This can lead to kernel page faults on accessing uninitialized fields of struct zfcp_scsi_dev in zfcp_erp_lun_shutdown_wait. Take the port field of the struct to determine if slave_alloc returned early. This zfcp bug is exposed by 4e6c82b (in turn fixing f7c9c6b to be compatible with 21208ae) which can call slave_destroy for a corresponding previous slave_alloc that did not finish. This patch is based on James Bottomley's fix suggestion in http://www.spinics.net/lists/linux-scsi/msg55449.html. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Cc: <stable@kernel.org> #2.6.38+ --- drivers/s390/scsi/zfcp_scsi.c | 4 ++++ 1 file changed, 4 insertions(+) diff -urpN linux-2.6/drivers/s390/scsi/zfcp_scsi.c linux-2.6-patched/drivers/s390/scsi/zfcp_scsi.c --- linux-2.6/drivers/s390/scsi/zfcp_scsi.c 2011-12-01 13:08:32.000000000 +0100 +++ linux-2.6-patched/drivers/s390/scsi/zfcp_scsi.c 2011-12-01 13:08:52.000000000 +0100 @@ -55,6 +55,10 @@ static void zfcp_scsi_slave_destroy(stru { struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev); + /* if previous slave_alloc returned early, there is nothing to do */ + if (!zfcp_sdev->port) + return; + zfcp_erp_lun_shutdown_wait(sdev, "scssd_1"); put_device(&zfcp_sdev->port->dev); } ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-12-12 12:39 ` Heiko Carstens @ 2011-12-13 16:50 ` Mike Snitzer 0 siblings, 0 replies; 61+ messages in thread From: Mike Snitzer @ 2011-12-13 16:50 UTC (permalink / raw) To: Heiko Carstens Cc: Hannes Reinecke, Jun'ichi Nomura, James Bottomley, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas, dm-devel On Mon, Dec 12 2011 at 7:39am -0500, Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Tue, Nov 29, 2011 at 03:18:03PM -0500, Mike Snitzer wrote: > > On Tue, Nov 29 2011 at 7:00am -0500, > > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > [ 4906.683654] Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 > > > > ... > > > > > Gonzalo also tried 2.6.38.8 as suggested and ran into this one: > > > > > > [ 292.877936] ------------[ cut here ]------------ > > > [ 292.877939] Kernel BUG at 6b6b6b6b6b6b6b6d [verbose debug info unavailable] > > > > Again, more poison. > > > > Seems this test is causing us to fall on our face no matter what. > > Likely, best to leave this 2.6.38 blk_unplug crash to one side and > > continue focusing on latest upstream. > > Sorry again, for taking so long to come back. This time however with good news: > > With 3.2.0-rc4.00255.g77a7300 we were unable to reproduce any I/O stall or > user-after-free bugs even after nearly 3000 test iterations. Great news, so with an eye towards getting these fixes upstream: > The only patches on top we have are: > > two patches from Hannes: > http://www.spinics.net/lists/linux-scsi/msg55112.html Has that scsi_lib.c patch been posted with a formal patch header? James, I'm not clear on where I should be looking to see what you have staged but not yet sent to Linus. Does such a branch exist in your scsi-misc-2.6 tree? > http://www.spinics.net/lists/linux-scsi/msg55413.html Jun'ichi and Hannes said that additional NULL pointer check is needed: http://www.redhat.com/archives/dm-devel/2011-December/msg00022.html Hannes said he'd re-post an updated patch (but hasn't yet). Mike ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 10:42 ` James Bottomley 2011-10-31 11:46 ` Jun'ichi Nomura @ 2011-10-31 13:21 ` Mike Snitzer 2011-10-31 13:40 ` Heiko Carstens 1 sibling, 1 reply; 61+ messages in thread From: Mike Snitzer @ 2011-10-31 13:21 UTC (permalink / raw) To: James Bottomley Cc: Heiko Carstens, Jun'ichi Nomura, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Mon, Oct 31 2011 at 6:42am -0400, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Mon, 2011-10-31 at 11:05 +0100, Heiko Carstens wrote: > > On Tue, Oct 18, 2011 at 11:29:40AM -0500, James Bottomley wrote: > > > On Tue, 2011-10-18 at 17:45 +0200, Heiko Carstens wrote: > > > > On Tue, Oct 18, 2011 at 10:31:20PM +0900, Jun'ichi Nomura wrote: > > > > > On 10/17/11 23:06, James Bottomley wrote: > > > > > > On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote: > > > > > >> On 10/15/11 01:03, James Bottomley wrote: > > > > > >>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote: > > > > > >>>> Initially, we encountered use-after-free bugs in > > > > > >>>> scsi_print_command / scsi_dispatch_cmd > > > > > >>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2 > > > > > >> > > > > > >> It is interesting that both this and the older report > > > > > >> got oopsed in scsi_log_send(), while there are other > > > > > >> dereferences of 'cmd' around scsi_dispatch_cmd(). > > > > > >> Are there any reason they are special? Just by accident? > > > > > > > > > > > > Right, that's why it looks like the command area got freed rather than > > > > > > the command pointer was bogus (6b is a poison free pattern). Perhaps if > > > > > > the reporter could pin down the failing source line, we'd know better > > > > > > what was going on? > > > > > > > > > > Yeah, that might be useful. > > > > > > > > The struct scsi_cmnd that was passed to scsi_log_send() was already freed > > > > (contents completely 6b6b6b...). > > > > Since SLUB debugging was turned on we can see that it was freed from > > > > __scsi_put_command(). Not too much of a surprise. > > > > > > But it does tell us the put must be racing with dispatch, since > > > dereferencing the command to find the device worked higher up in > > > scsi_dispatch_cmd(). > > > > > > There is one way to invalidate the theory that we cloned something with > > > an attached command, and that's to put > > > > > > BUG_ON(rq->special) > > > > > > in blk_insert_cloned_request(). I think we're careful about clearing > > > it, so it should work (perhaps a warn on just in case). > > > > It _looks_ like we do not hit the BUG_ON() that. This time we get this instead: > > > > [ 4024.937870] Unable to handle kernel pointer dereference at virtual kernel address 000003e004d41000 > > [ 4024.937886] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 4024.937899] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [las > > t unloaded: scsi_wait_scan] > > [ 4024.937925] CPU: 1 Not tainted 3.0.7-50.x.20111021-s390xdefault #1 > > [ 4024.937930] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > > [ 4024.937936] Krnl PSW : 0704000180000000 000003e00126263a (dm_softirq_done+0x72/0x140 [dm_mod]) > > [ 4024.937959] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 > > [ 4024.937966] Krnl GPRS: 000000007b9156b0 000003e004d41100 000000000e14b600 000000000000006d > > [ 4024.937971] 00000000715332b0 000000000c140ce8 000000000090d2ef 0000000000000005 > > [ 4024.937977] 0000000000000001 0000000000000101 000000000c140d00 0000000000000000 > > [ 4024.937983] 000003e001260000 000003e00126f098 0000000073adfd08 0000000073adfcb8 > > [ 4024.938001] Krnl Code: 000003e00126262a: f0a0000407f1 srp 4(11,%r0),2033,0 > > [ 4024.938009] 000003e001262630: e31050080004 lg %r1,8(%r5) > > [ 4024.938017] 000003e001262636: 58b05180 l %r11,384(%r5) > > [ 4024.938024] >000003e00126263a: e31010080004 lg %r1,8(%r1) > > [ 4024.938031] 000003e001262640: e31010500004 lg %r1,80(%r1) > > [ 4024.938038] 000003e001262646: b9020011 ltgr %r1,%r1 > > [ 4024.938045] 000003e00126264a: a784ffdf brc 8,3e001262608 > > [ 4024.938053] 000003e00126264e: e32050080004 lg %r2,8(%r5) > > [ 4024.938060] Call Trace: > > [ 4024.938063] ([<070000000040716c>] 0x70000000040716c) > > [ 4024.938069] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > [ 4024.938080] [<00000000001587c2>] __do_softirq+0xda/0x398 > > [ 4024.938088] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > > [ 4024.938095] [<000000000017c2aa>] kthread+0xa6/0xb0 > > [ 4024.938102] [<000000000061970e>] kernel_thread_starter+0x6/0xc > > [ 4024.938112] [<0000000000619708>] kernel_thread_starter+0x0/0xc > > [ 4024.938118] INFO: lockdep is turned off. > > [ 4024.938121] Last Breaking-Event-Address: > > [ 4024.938124] [<000003e001262600>] dm_softirq_done+0x38/0x140 [dm_mod] > > [ 4024.938135] > > [ 4024.938139] Kernel panic - not syncing: Fatal exception in interrupt > > [ 4024.938144] CPU: 1 Tainted: G D 3.0.7-50.x.20111021-s390xdefault #1 > > [ 4024.938150] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > > [ 4024.938155] 0000000073adf958 0000000073adf8d8 0000000000000002 0000000000000000 > > [ 4024.938164] 0000000073adf978 0000000073adf8f0 0000000073adf8f0 000000000061386a > > [ 4024.938174] 0000000000000000 0000000000000000 0000000000000005 0000000000100ec6 > > [ 4024.938184] 000000000000000d 000000000000000c 0000000073adf940 0000000000000000 > > [ 4024.938194] 0000000000000000 0000000000100a18 0000000073adf8d8 0000000073adf918 > > [ 4024.938205] Call Trace: > > [ 4024.938208] ([<0000000000100926>] show_trace+0xee/0x144) > > [ 4024.938216] [<0000000000613694>] panic+0xb0/0x234 > > [ 4024.938224] [<0000000000100ec6>] die+0x15a/0x168 > > [ 4024.938230] [<000000000011fb9e>] do_no_context+0xba/0xf8 > > [ 4024.938306] [<000000000061c074>] do_dat_exception+0x378/0x3e4 > > [ 4024.938314] [<0000000000619e02>] pgm_exit+0x0/0x4 > > [ 4024.938319] [<000003e00126263a>] dm_softirq_done+0x72/0x140 [dm_mod] > > [ 4024.938329] ([<070000000040716c>] 0x70000000040716c) > > [ 4024.938334] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > [ 4024.938341] [<00000000001587c2>] __do_softirq+0xda/0x398 > > [ 4024.938347] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > > [ 4024.938354] [<000000000017c2aa>] kthread+0xa6/0xb0 > > [ 4024.938360] [<000000000061970e>] kernel_thread_starter+0x6/0xc > > [ 4024.938366] [<0000000000619708>] kernel_thread_starter+0x0/0xc > > [ 4024.938373] INFO: lockdep is turned off. > > > > So we thought we might as well upgrade to 3.1 but immediately got a > > > > kernel BUG at block/blk-flush.c:323! > > > > which was handled here https://lkml.org/lkml/2011/10/4/105 and > > here https://lkml.org/lkml/2011/10/12/408 . > > > > But no patches for that one went upstream AFAICS. > > Well, all I can say is "hm". You put only a BUG_ON() in the code, which > wasn't triggered, but now we get a completely different oops. However, > I think it does point to the dm barrier handling code. Can you turn off > barriers and see if all oopses go away? There are two 3.1-stable fixes from Jeff Moyer that Jens staged for Linus to pick up (but seems Jens hasn't sent his 3.2 pull to Linus yet): http://git.kernel.dk/?p=linux-block.git;a=commit;h=8f02b3a09b1b7d2a4d24b8cd7008f2a441f19a14 http://git.kernel.dk/?p=linux-block.git;a=commit;h=f26d8f0562da76731cb049943a0e9d9fa81d946a ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 13:21 ` Mike Snitzer @ 2011-10-31 13:40 ` Heiko Carstens 2011-10-31 14:01 ` Mike Snitzer 0 siblings, 1 reply; 61+ messages in thread From: Heiko Carstens @ 2011-10-31 13:40 UTC (permalink / raw) To: Mike Snitzer Cc: James Bottomley, Jun'ichi Nomura, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas On Mon, Oct 31, 2011 at 09:21:58AM -0400, Mike Snitzer wrote: > > > It _looks_ like we do not hit the BUG_ON() that. This time we get this instead: > > > > > > [ 4024.937870] Unable to handle kernel pointer dereference at virtual kernel address 000003e004d41000 > > > [ 4024.937886] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > > [ 4024.937899] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [las > > > t unloaded: scsi_wait_scan] > > > [ 4024.937925] CPU: 1 Not tainted 3.0.7-50.x.20111021-s390xdefault #1 > > > [ 4024.937930] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > > > [ 4024.937936] Krnl PSW : 0704000180000000 000003e00126263a (dm_softirq_done+0x72/0x140 [dm_mod]) > > > [ 4024.937959] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 > > > [ 4024.937966] Krnl GPRS: 000000007b9156b0 000003e004d41100 000000000e14b600 000000000000006d > > > [ 4024.937971] 00000000715332b0 000000000c140ce8 000000000090d2ef 0000000000000005 > > > [ 4024.937977] 0000000000000001 0000000000000101 000000000c140d00 0000000000000000 > > > [ 4024.937983] 000003e001260000 000003e00126f098 0000000073adfd08 0000000073adfcb8 > > > [ 4024.938001] Krnl Code: 000003e00126262a: f0a0000407f1 srp 4(11,%r0),2033,0 > > > [ 4024.938009] 000003e001262630: e31050080004 lg %r1,8(%r5) > > > [ 4024.938017] 000003e001262636: 58b05180 l %r11,384(%r5) > > > [ 4024.938024] >000003e00126263a: e31010080004 lg %r1,8(%r1) > > > [ 4024.938031] 000003e001262640: e31010500004 lg %r1,80(%r1) > > > [ 4024.938038] 000003e001262646: b9020011 ltgr %r1,%r1 > > > [ 4024.938045] 000003e00126264a: a784ffdf brc 8,3e001262608 > > > [ 4024.938053] 000003e00126264e: e32050080004 lg %r2,8(%r5) > > > [ 4024.938060] Call Trace: > > > [ 4024.938063] ([<070000000040716c>] 0x70000000040716c) > > > [ 4024.938069] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > > [ 4024.938080] [<00000000001587c2>] __do_softirq+0xda/0x398 > > > [ 4024.938088] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > > > [ 4024.938095] [<000000000017c2aa>] kthread+0xa6/0xb0 > > > [ 4024.938102] [<000000000061970e>] kernel_thread_starter+0x6/0xc > > > [ 4024.938112] [<0000000000619708>] kernel_thread_starter+0x0/0xc > > > [ 4024.938118] INFO: lockdep is turned off. > > > [ 4024.938121] Last Breaking-Event-Address: > > > [ 4024.938124] [<000003e001262600>] dm_softirq_done+0x38/0x140 [dm_mod] > > > [ 4024.938135] > > > [ 4024.938139] Kernel panic - not syncing: Fatal exception in interrupt > > > [ 4024.938144] CPU: 1 Tainted: G D 3.0.7-50.x.20111021-s390xdefault #1 > > > [ 4024.938150] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > > > [ 4024.938155] 0000000073adf958 0000000073adf8d8 0000000000000002 0000000000000000 > > > [ 4024.938164] 0000000073adf978 0000000073adf8f0 0000000073adf8f0 000000000061386a > > > [ 4024.938174] 0000000000000000 0000000000000000 0000000000000005 0000000000100ec6 > > > [ 4024.938184] 000000000000000d 000000000000000c 0000000073adf940 0000000000000000 > > > [ 4024.938194] 0000000000000000 0000000000100a18 0000000073adf8d8 0000000073adf918 > > > [ 4024.938205] Call Trace: > > > [ 4024.938208] ([<0000000000100926>] show_trace+0xee/0x144) > > > [ 4024.938216] [<0000000000613694>] panic+0xb0/0x234 > > > [ 4024.938224] [<0000000000100ec6>] die+0x15a/0x168 > > > [ 4024.938230] [<000000000011fb9e>] do_no_context+0xba/0xf8 > > > [ 4024.938306] [<000000000061c074>] do_dat_exception+0x378/0x3e4 > > > [ 4024.938314] [<0000000000619e02>] pgm_exit+0x0/0x4 > > > [ 4024.938319] [<000003e00126263a>] dm_softirq_done+0x72/0x140 [dm_mod] > > > [ 4024.938329] ([<070000000040716c>] 0x70000000040716c) > > > [ 4024.938334] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > > [ 4024.938341] [<00000000001587c2>] __do_softirq+0xda/0x398 > > > [ 4024.938347] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > > > [ 4024.938354] [<000000000017c2aa>] kthread+0xa6/0xb0 > > > [ 4024.938360] [<000000000061970e>] kernel_thread_starter+0x6/0xc > > > [ 4024.938366] [<0000000000619708>] kernel_thread_starter+0x0/0xc > > > [ 4024.938373] INFO: lockdep is turned off. > > > > > > So we thought we might as well upgrade to 3.1 but immediately got a > > > > > > kernel BUG at block/blk-flush.c:323! > > > > > > which was handled here https://lkml.org/lkml/2011/10/4/105 and > > > here https://lkml.org/lkml/2011/10/12/408 . > > > > > > But no patches for that one went upstream AFAICS. > > > > Well, all I can say is "hm". You put only a BUG_ON() in the code, which > > wasn't triggered, but now we get a completely different oops. However, > > I think it does point to the dm barrier handling code. Can you turn off > > barriers and see if all oopses go away? > > There are two 3.1-stable fixes from Jeff Moyer that Jens staged for > Linus to pick up (but seems Jens hasn't sent his 3.2 pull to Linus yet): > > http://git.kernel.dk/?p=linux-block.git;a=commit;h=8f02b3a09b1b7d2a4d24b8cd7008f2a441f19a14 > http://git.kernel.dk/?p=linux-block.git;a=commit;h=f26d8f0562da76731cb049943a0e9d9fa81d946a Those two fixes would only address the "kernel BUG at block/blk-flush.c:323!" but not the crash report above, right? Since looking at the changelog they refer to a patch that went in with 3.1-rc1 while the crash report above is with 3.0.7. Oh well... ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) 2011-10-31 13:40 ` Heiko Carstens @ 2011-10-31 14:01 ` Mike Snitzer 0 siblings, 0 replies; 61+ messages in thread From: Mike Snitzer @ 2011-10-31 14:01 UTC (permalink / raw) To: Heiko Carstens Cc: James Bottomley, Jun'ichi Nomura, Steffen Maier, linux-scsi@vger.kernel.org, Jens Axboe, Hannes Reinecke, Linux Kernel, Alan Stern, Thadeu Lima de Souza Cascardo, Taraka R. Bodireddy, Seshagiri N. Ippili, Manvanthara B. Puttashankar, Jeff Moyer, Shaohua Li, gmuelas, dm-devel On Mon, Oct 31 2011 at 9:40am -0400, Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Mon, Oct 31, 2011 at 09:21:58AM -0400, Mike Snitzer wrote: > > > > It _looks_ like we do not hit the BUG_ON() that. This time we get this instead: > > > > > > > > [ 4024.937870] Unable to handle kernel pointer dereference at virtual kernel address 000003e004d41000 > > > > [ 4024.937886] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > > > [ 4024.937899] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [las > > > > t unloaded: scsi_wait_scan] > > > > [ 4024.937925] CPU: 1 Not tainted 3.0.7-50.x.20111021-s390xdefault #1 > > > > [ 4024.937930] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > > > > [ 4024.937936] Krnl PSW : 0704000180000000 000003e00126263a (dm_softirq_done+0x72/0x140 [dm_mod]) > > > > [ 4024.937959] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 > > > > [ 4024.937966] Krnl GPRS: 000000007b9156b0 000003e004d41100 000000000e14b600 000000000000006d > > > > [ 4024.937971] 00000000715332b0 000000000c140ce8 000000000090d2ef 0000000000000005 > > > > [ 4024.937977] 0000000000000001 0000000000000101 000000000c140d00 0000000000000000 > > > > [ 4024.937983] 000003e001260000 000003e00126f098 0000000073adfd08 0000000073adfcb8 > > > > [ 4024.938001] Krnl Code: 000003e00126262a: f0a0000407f1 srp 4(11,%r0),2033,0 > > > > [ 4024.938009] 000003e001262630: e31050080004 lg %r1,8(%r5) > > > > [ 4024.938017] 000003e001262636: 58b05180 l %r11,384(%r5) > > > > [ 4024.938024] >000003e00126263a: e31010080004 lg %r1,8(%r1) > > > > [ 4024.938031] 000003e001262640: e31010500004 lg %r1,80(%r1) > > > > [ 4024.938038] 000003e001262646: b9020011 ltgr %r1,%r1 > > > > [ 4024.938045] 000003e00126264a: a784ffdf brc 8,3e001262608 > > > > [ 4024.938053] 000003e00126264e: e32050080004 lg %r2,8(%r5) > > > > [ 4024.938060] Call Trace: > > > > [ 4024.938063] ([<070000000040716c>] 0x70000000040716c) > > > > [ 4024.938069] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > > > [ 4024.938080] [<00000000001587c2>] __do_softirq+0xda/0x398 > > > > [ 4024.938088] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > > > > [ 4024.938095] [<000000000017c2aa>] kthread+0xa6/0xb0 > > > > [ 4024.938102] [<000000000061970e>] kernel_thread_starter+0x6/0xc > > > > [ 4024.938112] [<0000000000619708>] kernel_thread_starter+0x0/0xc > > > > [ 4024.938118] INFO: lockdep is turned off. > > > > [ 4024.938121] Last Breaking-Event-Address: > > > > [ 4024.938124] [<000003e001262600>] dm_softirq_done+0x38/0x140 [dm_mod] > > > > [ 4024.938135] > > > > [ 4024.938139] Kernel panic - not syncing: Fatal exception in interrupt > > > > [ 4024.938144] CPU: 1 Tainted: G D 3.0.7-50.x.20111021-s390xdefault #1 > > > > [ 4024.938150] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50) > > > > [ 4024.938155] 0000000073adf958 0000000073adf8d8 0000000000000002 0000000000000000 > > > > [ 4024.938164] 0000000073adf978 0000000073adf8f0 0000000073adf8f0 000000000061386a > > > > [ 4024.938174] 0000000000000000 0000000000000000 0000000000000005 0000000000100ec6 > > > > [ 4024.938184] 000000000000000d 000000000000000c 0000000073adf940 0000000000000000 > > > > [ 4024.938194] 0000000000000000 0000000000100a18 0000000073adf8d8 0000000073adf918 > > > > [ 4024.938205] Call Trace: > > > > [ 4024.938208] ([<0000000000100926>] show_trace+0xee/0x144) > > > > [ 4024.938216] [<0000000000613694>] panic+0xb0/0x234 > > > > [ 4024.938224] [<0000000000100ec6>] die+0x15a/0x168 > > > > [ 4024.938230] [<000000000011fb9e>] do_no_context+0xba/0xf8 > > > > [ 4024.938306] [<000000000061c074>] do_dat_exception+0x378/0x3e4 > > > > [ 4024.938314] [<0000000000619e02>] pgm_exit+0x0/0x4 > > > > [ 4024.938319] [<000003e00126263a>] dm_softirq_done+0x72/0x140 [dm_mod] > > > > [ 4024.938329] ([<070000000040716c>] 0x70000000040716c) > > > > [ 4024.938334] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0 > > > > [ 4024.938341] [<00000000001587c2>] __do_softirq+0xda/0x398 > > > > [ 4024.938347] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c > > > > [ 4024.938354] [<000000000017c2aa>] kthread+0xa6/0xb0 > > > > [ 4024.938360] [<000000000061970e>] kernel_thread_starter+0x6/0xc > > > > [ 4024.938366] [<0000000000619708>] kernel_thread_starter+0x0/0xc > > > > [ 4024.938373] INFO: lockdep is turned off. > > > > > > > > So we thought we might as well upgrade to 3.1 but immediately got a > > > > > > > > kernel BUG at block/blk-flush.c:323! > > > > > > > > which was handled here https://lkml.org/lkml/2011/10/4/105 and > > > > here https://lkml.org/lkml/2011/10/12/408 . > > > > > > > > But no patches for that one went upstream AFAICS. > > > > > > Well, all I can say is "hm". You put only a BUG_ON() in the code, which > > > wasn't triggered, but now we get a completely different oops. However, > > > I think it does point to the dm barrier handling code. Can you turn off > > > barriers and see if all oopses go away? > > > > There are two 3.1-stable fixes from Jeff Moyer that Jens staged for > > Linus to pick up (but seems Jens hasn't sent his 3.2 pull to Linus yet): > > > > http://git.kernel.dk/?p=linux-block.git;a=commit;h=8f02b3a09b1b7d2a4d24b8cd7008f2a441f19a14 > > http://git.kernel.dk/?p=linux-block.git;a=commit;h=f26d8f0562da76731cb049943a0e9d9fa81d946a > > Those two fixes would only address the "kernel BUG at block/blk-flush.c:323!" but not the > crash report above, right? Right. > Since looking at the changelog they refer to a patch that went in with 3.1-rc1 while the > crash report above is with 3.0.7. Oh well... Good data point. This is the second request-based DM report I've seen now with 3.0 (first was with fedora on btrfs and request-based DM). Will look closer but it should be noted that DM didn't change significantly in 3.0. So it is likely a lingering oversight from the block changes introduced for onstack plugging (from 2.6.39) or some other change. ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2011-12-13 16:51 UTC | newest] Thread overview: 61+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-22 13:18 [PATCH] block: Free queue resources at blk_release_queue() Hannes Reinecke 2011-09-28 0:47 ` Jens Axboe 2011-09-28 0:55 ` Linus Torvalds 2011-09-28 1:15 ` Jens Axboe 2011-09-28 1:59 ` Linus Torvalds 2011-09-28 2:02 ` Jens Axboe 2011-09-28 4:10 ` James Bottomley 2011-09-28 14:08 ` Jens Axboe 2011-09-28 14:11 ` James Bottomley 2011-09-28 14:14 ` [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) Jens Axboe 2011-09-28 15:22 ` Linus Torvalds 2011-09-28 15:43 ` James Bottomley 2011-09-28 17:48 ` Vivek Goyal 2011-09-28 17:53 ` Christoph Hellwig 2011-09-28 18:09 ` Vivek Goyal 2011-09-28 18:16 ` Christoph Hellwig 2011-09-28 19:05 ` Eric Seppanen 2011-09-28 19:14 ` Christoph Hellwig 2011-11-30 10:18 ` Jens Axboe 2011-11-30 10:26 ` Christoph Hellwig 2011-09-28 22:34 ` Vivek Goyal 2011-09-28 17:59 ` James Bottomley 2011-10-13 13:09 ` Steffen Maier 2011-10-14 16:03 ` James Bottomley 2011-10-17 8:46 ` Jun'ichi Nomura 2011-10-17 14:06 ` James Bottomley 2011-10-18 13:31 ` Jun'ichi Nomura 2011-10-18 15:45 ` Heiko Carstens 2011-10-18 16:29 ` James Bottomley 2011-10-31 10:05 ` Heiko Carstens 2011-10-31 10:42 ` James Bottomley 2011-10-31 11:46 ` Jun'ichi Nomura 2011-10-31 13:00 ` Heiko Carstens 2011-11-02 12:37 ` Jun'ichi Nomura 2011-11-02 12:44 ` Hannes Reinecke 2011-11-02 13:47 ` Heiko Carstens 2011-11-04 4:07 ` Jun'ichi Nomura 2011-11-04 9:12 ` Heiko Carstens 2011-11-03 18:25 ` Mike Snitzer 2011-11-04 9:19 ` Heiko Carstens 2011-11-04 13:30 ` Mike Snitzer 2011-11-04 13:37 ` Hannes Reinecke 2011-11-07 11:31 ` Jun'ichi Nomura 2011-11-07 13:42 ` Mike Snitzer 2011-11-07 12:23 ` Heiko Carstens 2011-11-07 11:30 ` Jun'ichi Nomura 2011-11-07 15:36 ` Mike Snitzer 2011-11-07 16:43 ` Heiko Carstens 2011-11-07 17:10 ` Mike Snitzer 2011-11-07 21:44 ` Mike Snitzer 2011-11-09 9:37 ` Hannes Reinecke 2011-11-10 16:10 ` Heiko Carstens 2011-11-17 16:29 ` Mike Snitzer 2011-11-29 12:00 ` Heiko Carstens 2011-11-29 20:18 ` Mike Snitzer 2011-11-30 7:25 ` Hannes Reinecke 2011-12-12 12:39 ` Heiko Carstens 2011-12-13 16:50 ` Mike Snitzer 2011-10-31 13:21 ` Mike Snitzer 2011-10-31 13:40 ` Heiko Carstens 2011-10-31 14:01 ` Mike Snitzer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).