* [PATCH] scsi: avoid use of reclaimed reference [not found] <cover.1384304973.git.decot@googlers.com> @ 2013-11-13 1:10 ` David Decotigny 2013-11-13 1:57 ` James Bottomley 2013-11-13 12:06 ` Bart Van Assche 0 siblings, 2 replies; 7+ messages in thread From: David Decotigny @ 2013-11-13 1:10 UTC (permalink / raw) To: linux-scsi; +Cc: James E.J. Bottomley, linux-kernel, David Decotigny This patch avoids to use an object after it was potentially reclaimed by scsi_device_put(). Signed-off-by: David Decotigny <decot@googlers.com> --- drivers/scsi/scsi_scan.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 307a811..16e4a44 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1498,12 +1498,14 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, out_err: kfree(lun_data); out: - scsi_device_put(sdev); - if (scsi_device_created(sdev)) + if (scsi_device_created(sdev)) { /* * the sdev we used didn't appear in the report luns scan */ __scsi_remove_device(sdev); + } + + scsi_device_put(sdev); return ret; } -- 1.8.4.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] scsi: avoid use of reclaimed reference 2013-11-13 1:10 ` [PATCH] scsi: avoid use of reclaimed reference David Decotigny @ 2013-11-13 1:57 ` James Bottomley 2013-11-13 2:09 ` David Decotigny 2013-11-13 12:06 ` Bart Van Assche 1 sibling, 1 reply; 7+ messages in thread From: James Bottomley @ 2013-11-13 1:57 UTC (permalink / raw) To: David Decotigny; +Cc: linux-scsi, linux-kernel On Tue, 2013-11-12 at 17:10 -0800, David Decotigny wrote: > This patch avoids to use an object after it was potentially reclaimed > by scsi_device_put(). The analysis is wrong, I'm afraid. __scsi_remove_device() does the final put for devices that are being destroyed. If the device isn't in the created state, then it's long lived and nothing in the report lun scan does the final put. James ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] scsi: avoid use of reclaimed reference 2013-11-13 1:57 ` James Bottomley @ 2013-11-13 2:09 ` David Decotigny 2013-11-13 6:46 ` James Bottomley 0 siblings, 1 reply; 7+ messages in thread From: David Decotigny @ 2013-11-13 2:09 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi, linux-kernel@vger.kernel.org I was considering the following scenario wherein the "if (scsi_device_created(sdev))" test at the end would test garbage at best (or unmapped data): if (!(sdev = scsi_device_lookup_by_target(starget, 0))) { // not found sdev = scsi_alloc_sdev(starget, 0, NULL); // -> ref cnt = 2 ... if (scsi_device_get(sdev)) { // -> ref cnt = 3 ... } ... } ... res = scsi_probe_and_add_lun(starget, // -> ref cnt = 1 ... scsi_device_put(sdev); // -> reclaimed if (scsi_device_created(sdev)) // test on garbage or unmapped data (#PF) ... On Tue, Nov 12, 2013 at 5:57 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Tue, 2013-11-12 at 17:10 -0800, David Decotigny wrote: >> This patch avoids to use an object after it was potentially reclaimed >> by scsi_device_put(). > > The analysis is wrong, I'm afraid. __scsi_remove_device() does the > final put for devices that are being destroyed. If the device isn't in > the created state, then it's long lived and nothing in the report lun > scan does the final put. > > James > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] scsi: avoid use of reclaimed reference 2013-11-13 2:09 ` David Decotigny @ 2013-11-13 6:46 ` James Bottomley 0 siblings, 0 replies; 7+ messages in thread From: James Bottomley @ 2013-11-13 6:46 UTC (permalink / raw) To: David Decotigny; +Cc: linux-scsi, linux-kernel@vger.kernel.org On Tue, 2013-11-12 at 18:09 -0800, David Decotigny wrote: > I was considering the following scenario wherein the "if > (scsi_device_created(sdev))" test at the end would test garbage at > best (or unmapped data): Well, no, the counting isn't right: > if (!(sdev = scsi_device_lookup_by_target(starget, 0))) { // not found > sdev = scsi_alloc_sdev(starget, 0, NULL); // -> ref cnt = 2 1 > if (scsi_device_get(sdev)) { // -> ref cnt = 3 2 > } > ... > } > ... > res = scsi_probe_and_add_lun(starget, // -> > ref cnt = 1 No idea what you think here, where were the other puts? If starget,lun is sdev, then the count goes to 3 here otherwise it stays at 2 if it isn't reported in the scan. ... > scsi_device_put(sdev); // -> reclaimed No, it goes to either 2 or 1 here. If it goes to 1 it's because the sdev was never probed and thus it remains in the created state. > if (scsi_device_created(sdev)) // test on garbage or unmapped data (#PF) Which means this test passes and it gets garbage collected by __scsi_remove_device(). Otherwise we exit with refcount 2. James ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] scsi: avoid use of reclaimed reference 2013-11-13 1:10 ` [PATCH] scsi: avoid use of reclaimed reference David Decotigny 2013-11-13 1:57 ` James Bottomley @ 2013-11-13 12:06 ` Bart Van Assche 2013-11-14 2:50 ` David Decotigny 1 sibling, 1 reply; 7+ messages in thread From: Bart Van Assche @ 2013-11-13 12:06 UTC (permalink / raw) To: David Decotigny, linux-scsi; +Cc: James E.J. Bottomley, linux-kernel On 11/13/13 02:10, David Decotigny wrote: > This patch avoids to use an object after it was potentially reclaimed > by scsi_device_put(). > > Signed-off-by: David Decotigny <decot@googlers.com> > --- > drivers/scsi/scsi_scan.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index 307a811..16e4a44 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -1498,12 +1498,14 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, > out_err: > kfree(lun_data); > out: > - scsi_device_put(sdev); > - if (scsi_device_created(sdev)) > + if (scsi_device_created(sdev)) { > /* > * the sdev we used didn't appear in the report luns scan > */ > __scsi_remove_device(sdev); > + } > + > + scsi_device_put(sdev); > return ret; > } It would help if you could explain why you started looking at this code. Is the above patch something you came up with after having analyzed the SCSI mid-layer source code or perhaps as the result of a test that failed ? If so, which test was it that failed ? Thanks, Bart. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] scsi: avoid use of reclaimed reference 2013-11-13 12:06 ` Bart Van Assche @ 2013-11-14 2:50 ` David Decotigny 2013-11-14 6:15 ` James Bottomley 0 siblings, 1 reply; 7+ messages in thread From: David Decotigny @ 2013-11-14 2:50 UTC (permalink / raw) To: Bart Van Assche Cc: linux-scsi, James E.J. Bottomley, linux-kernel@vger.kernel.org Hello, Thank you for looking into this. I could reproduce the oops on some Dell Poweredge R720 with the following config flags, otherwise the problem goes un-noticed: CONFIG_DEBUG_PAGEALLOC=y CONFIG_DEBUG_SLAB=y [ 4.924033] BUG: unable to handle kernel paging request at ffff88000004dd10 [ 4.931823] IP: [<ffffffff8139797f>] __scsi_scan_target+0x3ef/0x6f0 [ 4.938846] PGD 1ba1067 PUD 1ba2067 PMD 1ba3067 PTE 800000000004d060 [ 4.945985] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 4.951074] Modules linked in: [ 4.954492] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-smp-scsi01 #1 This points to this line on the return path of scsi_report_lun_scan: if (scsi_device_created(sdev)) Kernel is jejb/scsi/for-next at 2aee240c68ed32 and I could reproduce the bug with other 3.x kernels on same hardware. For me, it is 100% reproducible. The ref counter values I indicated in my previous email are the result of a basic instrumentation. It shows that ref count drops from 3 to 1 as a result of scsi_probe_and_add_lun(). I believe this is because the latter calls __scsi_remove_device(sdev). Now, if sdev reclaiming is not allowed to happen at the end of scsi_report_lun_scan by design because someone else is expected to hold a reference to it, then I'd be happy to add a BUG_ON() on the return path and explicit the post-condition in the function documentation, and also try to find out where a ref is killed by mistake. However, if sdev relcaiming at the end of scsi_report_lun_scan is allowed, then I'd argue that the "if (scsi_device_created(sdev))" on the potentially reclaimed sdev is not right, that's why I was proposing this patch. Regards, On Wed, Nov 13, 2013 at 4:06 AM, Bart Van Assche <bvanassche@acm.org> wrote: > On 11/13/13 02:10, David Decotigny wrote: >> >> This patch avoids to use an object after it was potentially reclaimed >> by scsi_device_put(). >> >> Signed-off-by: David Decotigny <decot@googlers.com> >> --- >> drivers/scsi/scsi_scan.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c >> index 307a811..16e4a44 100644 >> --- a/drivers/scsi/scsi_scan.c >> +++ b/drivers/scsi/scsi_scan.c >> @@ -1498,12 +1498,14 @@ static int scsi_report_lun_scan(struct scsi_target >> *starget, int bflags, >> out_err: >> kfree(lun_data); >> out: >> - scsi_device_put(sdev); >> - if (scsi_device_created(sdev)) >> + if (scsi_device_created(sdev)) { >> /* >> * the sdev we used didn't appear in the report luns scan >> */ >> __scsi_remove_device(sdev); >> + } >> + >> + scsi_device_put(sdev); >> return ret; >> } > > > It would help if you could explain why you started looking at this code. Is > the above patch something you came up with after having analyzed the SCSI > mid-layer source code or perhaps as the result of a test that failed ? If > so, which test was it that failed ? > > Thanks, > > Bart. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] scsi: avoid use of reclaimed reference 2013-11-14 2:50 ` David Decotigny @ 2013-11-14 6:15 ` James Bottomley 0 siblings, 0 replies; 7+ messages in thread From: James Bottomley @ 2013-11-14 6:15 UTC (permalink / raw) To: David Decotigny; +Cc: Bart Van Assche, linux-scsi, linux-kernel@vger.kernel.org On Wed, 2013-11-13 at 18:50 -0800, David Decotigny wrote: > Hello, > > Thank you for looking into this. I could reproduce the oops on some > Dell Poweredge R720 with the following config flags, otherwise the > problem goes un-noticed: > > CONFIG_DEBUG_PAGEALLOC=y > CONFIG_DEBUG_SLAB=y > > [ 4.924033] BUG: unable to handle kernel paging request at ffff88000004dd10 > [ 4.931823] IP: [<ffffffff8139797f>] __scsi_scan_target+0x3ef/0x6f0 > [ 4.938846] PGD 1ba1067 PUD 1ba2067 PMD 1ba3067 PTE 800000000004d060 > [ 4.945985] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC > [ 4.951074] Modules linked in: > [ 4.954492] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-smp-scsi01 #1 > > This points to this line on the return path of scsi_report_lun_scan: > if (scsi_device_created(sdev)) > > Kernel is jejb/scsi/for-next at 2aee240c68ed32 and I could reproduce > the bug with other 3.x kernels on same hardware. For me, it is 100% > reproducible. > > The ref counter values I indicated in my previous email are the result > of a basic instrumentation. It shows that ref count drops from 3 to 1 > as a result of scsi_probe_and_add_lun(). I believe this is because the > latter calls __scsi_remove_device(sdev). > > Now, if sdev reclaiming is not allowed to happen at the end of > scsi_report_lun_scan by design because someone else is expected to > hold a reference to it, then I'd be happy to add a BUG_ON() on the > return path and explicit the post-condition in the function > documentation, and also try to find out where a ref is killed by > mistake. However, if sdev relcaiming at the end of > scsi_report_lun_scan is allowed, then I'd argue that the "if > (scsi_device_created(sdev))" on the potentially reclaimed sdev is not > right, that's why I was proposing this patch. Heh, perhaps this is why bug reports are so useful. Your patch told me where you thought the bug was but this report actually tells me where the bug is: it's in scsi_probe_and_add_luns(). There's no way that routine should reduce the refcount of a struct scsi_device. It should either leave it as it is, or if an sdevp is specified, increment the reference and return the sdev in sdevp. This should be enough information for you to come up with the fix. Please document it with the actual bug details like you have above and I'll apply it. James ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-11-14 6:15 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1384304973.git.decot@googlers.com>
2013-11-13 1:10 ` [PATCH] scsi: avoid use of reclaimed reference David Decotigny
2013-11-13 1:57 ` James Bottomley
2013-11-13 2:09 ` David Decotigny
2013-11-13 6:46 ` James Bottomley
2013-11-13 12:06 ` Bart Van Assche
2013-11-14 2:50 ` David Decotigny
2013-11-14 6:15 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox