* [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
@ 2017-07-18 0:44 Jason L Tibbitts III
2017-07-18 7:37 ` Johannes Thumshirn
0 siblings, 1 reply; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-18 0:44 UTC (permalink / raw)
To: linux-scsi; +Cc: linux-kernel, jthumshirn, dvyukov, hare
After updating my tape backup server to 4.12 I found that mtx had issues
controlling the tape library. Good behavior:
[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 4...done
Loading media from Storage Element 5 into drive 0...done
Bad behavior:
[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 46...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1046 Failed
This was seen on a machine running Fedora 25 as well as an Ubuntu
machine. Relevant tickets:
https://bugzilla.redhat.com/show_bug.cgi?id=1471302
http://bugzilla.kernel.org/show_bug.cgi?id=196375
https://bugs.launchpad.net/bugs/1704512
mtx in all cases is 1.3.12; in the Fedora case that's
mtx-1.3.12-14.fc24.x86_64. I see this with an Overland Neo T48s library
but the Ubuntu user had a Dell ML6000 and we both have completely
different HBAs and cabling (LSI3008 SAS and qla2462 FC).
I bisected this down to:
commit 28676d869bbb5257b5f14c0c95ad3af3a7019dd5
Author: Johannes Thumshirn <jthumshirn@suse.de>
Date: Fri Apr 7 09:34:15 2017 +0200
scsi: sg: check for valid direction before starting the request
Check for a valid direction before starting the request, otherwise we
risk running into an assertion in the scsi midlayer checking for valid
requests.
[mkp: fixed typo]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: http://www.spinics.net/lists/linux-scsi/msg104400.html
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
and confirmed that clean unpatched 4.12 shows the problem, while
reverting just that patch fixes the issue. Unfortunately I don't know
enough to actually fix this, but I can easily test patches.
- J<
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-18 0:44 [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control Jason L Tibbitts III @ 2017-07-18 7:37 ` Johannes Thumshirn 2017-07-18 16:53 ` Jason L Tibbitts III 2017-07-18 17:33 ` Jason L Tibbitts III 0 siblings, 2 replies; 15+ messages in thread From: Johannes Thumshirn @ 2017-07-18 7:37 UTC (permalink / raw) To: Jason L Tibbitts III Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen On Mon, Jul 17, 2017 at 07:44:50PM -0500, Jason L Tibbitts III wrote: > and confirmed that clean unpatched 4.12 shows the problem, while > reverting just that patch fixes the issue. Unfortunately I don't know > enough to actually fix this, but I can easily test patches. This is fixed with: commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 Author: Johannes Thumshirn <jthumshirn@suse.de> Date: Fri Jul 7 10:56:38 2017 +0200 scsi: sg: fix SG_DXFER_FROM_DEV transfers SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set it to NULL for the old sg_io read/write interface, but must have a length bigger than 0. This fixes a regression introduced by commit 28676d869bbb ("scsi: sg: check for valid direction before starting the request") Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the request") Reported-by: Chris Clayton <chris2553@googlemail.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-18 7:37 ` Johannes Thumshirn @ 2017-07-18 16:53 ` Jason L Tibbitts III 2017-07-18 17:33 ` Jason L Tibbitts III 1 sibling, 0 replies; 15+ messages in thread From: Jason L Tibbitts III @ 2017-07-18 16:53 UTC (permalink / raw) To: Johannes Thumshirn Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen >>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes: JT> This is fixed with: commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 Hmm, well, I just pulled and built mainline, which does appear to contain that patch (though it wasn't there when I first started investigating this last week) and the problem is still there. I'll try building clean 4.12 and applying just that patch over the top. - J< ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-18 7:37 ` Johannes Thumshirn 2017-07-18 16:53 ` Jason L Tibbitts III @ 2017-07-18 17:33 ` Jason L Tibbitts III 2017-07-19 7:03 ` Johannes Thumshirn 1 sibling, 1 reply; 15+ messages in thread From: Jason L Tibbitts III @ 2017-07-18 17:33 UTC (permalink / raw) To: Johannes Thumshirn Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen I have verified that building a clean v4.12 with 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry picked on top still shows the problem: [root@backup2 ~]# mtx -f /dev/sg7 next 0 Unloading drive 0 into Storage Element 45...mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=0 (Unknown?!) mtx: Request Sense: Sense Key=No Sense mtx: Request Sense: FileMark=no mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 00 mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no mtx: Request Sense: SKSV=no MOVE MEDIUM from Element Address 1 to 1045 Failed Nothing appears to be logged; is there any kind of debugging information I can collect which might help to track this down? I'm not particularly good at this but I am pretty sure that I'm building everything properly and am actually booting the patched kernel. - J< ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-18 17:33 ` Jason L Tibbitts III @ 2017-07-19 7:03 ` Johannes Thumshirn 2017-07-19 8:13 ` Jason L Tibbitts III 0 siblings, 1 reply; 15+ messages in thread From: Johannes Thumshirn @ 2017-07-19 7:03 UTC (permalink / raw) To: Jason L Tibbitts III Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen On Tue, Jul 18, 2017 at 12:33:59PM -0500, Jason L Tibbitts III wrote: > I have verified that building a clean v4.12 with > 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry picked on top still > shows the problem: > > [root@backup2 ~]# mtx -f /dev/sg7 next 0 > Unloading drive 0 into Storage Element 45...mtx: Request Sense: Long > Report=yes > mtx: Request Sense: Valid Residual=no > mtx: Request Sense: Error Code=0 (Unknown?!) > mtx: Request Sense: Sense Key=No Sense > mtx: Request Sense: FileMark=no > mtx: Request Sense: EOM=no > mtx: Request Sense: ILI=no > mtx: Request Sense: Additional Sense Code = 00 > mtx: Request Sense: Additional Sense Qualifier = 00 > mtx: Request Sense: BPV=no > mtx: Request Sense: Error in CDB=no > mtx: Request Sense: SKSV=no > MOVE MEDIUM from Element Address 1 to 1045 Failed > > Nothing appears to be logged; is there any kind of debugging information > I can collect which might help to track this down? I'm not particularly > good at this but I am pretty sure that I'm building everything properly > and am actually booting the patched kernel. Can you please apply this debugging patch, so I can see what's going on. diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 1e82d4128a84..2505fa0b2062 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -753,6 +753,10 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf, static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) { + + pr_info("%s: dxfer_direction: %d, dxfer_len: %d\n", + __func__, hp->dxfer_direction, hp->dxfer_len); + switch (hp->dxfer_direction) { case SG_DXFER_NONE: if (hp->dxferp || hp->dxfer_len > 0) Thanks, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-19 7:03 ` Johannes Thumshirn @ 2017-07-19 8:13 ` Jason L Tibbitts III 2017-07-19 8:36 ` Johannes Thumshirn 0 siblings, 1 reply; 15+ messages in thread From: Jason L Tibbitts III @ 2017-07-19 8:13 UTC (permalink / raw) To: Johannes Thumshirn Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen >>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes: JT> Can you please apply this debugging patch, so I can see what's going JT> on. Sure, no problem. I generally run "mtx -f /dev/sg7 status" first just to make sure the library is there; this has always worked as expected. With the debug patch applied, this is sent to the console: [ 33.933422] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56 [ 33.940526] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136 [ 33.982429] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 34.569986] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 34.623898] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 Then running "mtx -f /dev/sg7 next 0" gives this as stdout/err: Unloading drive 0 into Storage Element 46...mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=0 (Unknown?!) mtx: Request Sense: Sense Key=No Sense mtx: Request Sense: FileMark=no mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 00 mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no mtx: Request Sense: SKSV=no MOVE MEDIUM from Element Address 1 to 1046 Failed And this to the console: [ 45.552524] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56 [ 45.559626] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136 [ 45.603544] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 46.204614] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 46.258463] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0 Would you also want to see the output from that patch applied to a functioning kernel? - J< ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-19 8:13 ` Jason L Tibbitts III @ 2017-07-19 8:36 ` Johannes Thumshirn 2017-07-21 16:43 ` Johannes Thumshirn 2017-07-25 19:09 ` Douglas Gilbert 0 siblings, 2 replies; 15+ messages in thread From: Johannes Thumshirn @ 2017-07-19 8:36 UTC (permalink / raw) To: Jason L Tibbitts III, Doug Gilbert Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote: > [ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0 Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in your case. This would "fix" it but I'm not generally sure it is _the_ solution: diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 1e82d4128a84..b421ec81d775 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) return true; case SG_DXFER_TO_DEV: case SG_DXFER_TO_FROM_DEV: - if (!hp->dxferp || hp->dxfer_len == 0) + if (!hp->dxferp) return false; return true; case SG_DXFER_UNKNOWN: Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't be sure len > 0, can we rely on dxferp being present? Thanks, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-19 8:36 ` Johannes Thumshirn @ 2017-07-21 16:43 ` Johannes Thumshirn 2017-07-21 19:23 ` Jason L Tibbitts III 2017-07-25 19:09 ` Douglas Gilbert 1 sibling, 1 reply; 15+ messages in thread From: Johannes Thumshirn @ 2017-07-21 16:43 UTC (permalink / raw) To: Jason L Tibbitts III, Doug Gilbert Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen, jth Doug, On Wed, Jul 19, 2017 at 10:36:54AM +0200, Johannes Thumshirn wrote: > On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote: > > [ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0 > > Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in > your case. > > This would "fix" it but I'm not generally sure it is _the_ solution: > > diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c > index 1e82d4128a84..b421ec81d775 100644 > --- a/drivers/scsi/sg.c > +++ b/drivers/scsi/sg.c > @@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) > return true; > case SG_DXFER_TO_DEV: > case SG_DXFER_TO_FROM_DEV: > - if (!hp->dxferp || hp->dxfer_len == 0) > + if (!hp->dxferp) > return false; > return true; > case SG_DXFER_UNKNOWN: > > Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't > be sure len > 0, can we rely on dxferp being present? Any comments on this? Jason, can you try the above? If it works and Doug doesn't respond, I'm inclined yo submit this band aid. Thanks, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-21 16:43 ` Johannes Thumshirn @ 2017-07-21 19:23 ` Jason L Tibbitts III 2017-07-25 7:18 ` Johannes Thumshirn 0 siblings, 1 reply; 15+ messages in thread From: Jason L Tibbitts III @ 2017-07-21 19:23 UTC (permalink / raw) To: Johannes Thumshirn Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen, jth >>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes: JT> Jason, can you try the above? If it works and Doug doesn't respond, JT> I'm inclined yo submit this band aid. Unfortunately it doesn't appear to work for me. Maybe I'm building the wrong thing, though. I checked out 4.12, cherry picked 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 and then applied your one liner on top of that. There appears to be no change in behavior: [root@backup2 ~]# mtx -f /dev/sg7 next 0 Unloading drive 0 into Storage Element 47...mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=0 (Unknown?!) mtx: Request Sense: Sense Key=No Sense mtx: Request Sense: FileMark=no mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 00 mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no mtx: Request Sense: SKSV=no MOVE MEDIUM from Element Address 1 to 1047 Failed I can also apply the debugging patch and try again if that would give you more useful information. - J< ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-21 19:23 ` Jason L Tibbitts III @ 2017-07-25 7:18 ` Johannes Thumshirn 2017-07-25 18:25 ` Jason L Tibbitts III 0 siblings, 1 reply; 15+ messages in thread From: Johannes Thumshirn @ 2017-07-25 7:18 UTC (permalink / raw) To: Jason L Tibbitts III Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen On Fri, Jul 21, 2017 at 02:23:16PM -0500, Jason L Tibbitts III wrote: > I can also apply the debugging patch and try again if that would give > you more useful information. Yes please (on top of the snippet I've sent you last). Thanks a lot, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-25 7:18 ` Johannes Thumshirn @ 2017-07-25 18:25 ` Jason L Tibbitts III 2017-07-26 7:39 ` Johannes Thumshirn 0 siblings, 1 reply; 15+ messages in thread From: Jason L Tibbitts III @ 2017-07-25 18:25 UTC (permalink / raw) To: Johannes Thumshirn Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen >>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes: JT> Yes please (on top of the snippet I've sent you last). OK, I'm at 4.12 with 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry picked, plus the fix patch and the debug patch you've sent previously. To make sure we're on the same page, I'll include the patch at the end. Running "mtx -f /dev/sg7 status" gives proper output with this logged to the console: [ 36.742905] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56 [ 36.750036] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136 [ 36.791673] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 37.339790] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 37.393597] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 And running "mtx -f /dev/sg7 next 0" gives the following output: [root@backup2 ~]# mtx -f /dev/sg7 next 0 Unloading drive 0 into Storage Element 1...mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=0 (Unknown?!) mtx: Request Sense: Sense Key=No Sense mtx: Request Sense: FileMark=no mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 00 mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no mtx: Request Sense: SKSV=no MOVE MEDIUM from Element Address 1 to 1001 Failed And the following is logged to the console: [ 192.732294] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56 [ 192.739492] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136 [ 192.781507] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 193.392401] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 193.448970] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240 [ 193.495130] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0 That's not any different than what I provided before, and honestly I wouldn't expect it to be. Is there something else I can log or some debugging switch I can twiddle to give you any more information? I can also try to be more available to try and avoid the timezone-induced day-long cycle time. I'm available on IRC (tibbs on freenode and oftc) and can try to stay up late or get up early or something to try and avoid this time zone mismatch. Here's what an strace of the last mtx call says: open("/dev/sg7", O_RDWR) = 3 ioctl(3, SG_GET_VERSION_NUM, [30536]) = 0 ioctl(3, SG_SET_TIMEOUT, [30000]) = 0 brk(NULL) = 0x55d65f68a000 brk(0x55d65f6ab000) = 0x55d65f6ab000 brk(NULL) = 0x55d65f6ab000 ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=6, cmdp="\x12\x00\x00\x00\x38\x00", mx_sb_len=20, iovec_count=0, dxfer_len=56, timeout=30000, flags=0, dxferp="\x08\x80\x05\x02\x45\x00\x00\x02\x42\x44\x54\x20\x20\x20\x20\x20\x46\x6c\x65\x78\x53\x74\x6f\x72\x20\x49\x49\x20\x20\x20\x20\x20"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=1, info=0}) = 0 ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=6, cmdp="\x1a\x08\x1d\x00\x88\x00", mx_sb_len=20, iovec_count=0, dxfer_len=136, timeout=300000, flags=0, dxferp="\x17\x00\x00\x00\x9d\x12\x00\x00\x00\x01\x03\xe9\x00\x30\x00\x65\x00\x00\x00\x01\x00\x01\x00\x00", status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=112, duration=61, info=0}) = 0 ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0 ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x32\x03\xe9\x00\x30\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x03\xe9\x00\x30\x00\x00\x09\xc8\x02\x80\x00\x34\x00\x00\x09\xc0\x03\xe9\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=1728, duration=542, info=0}) = 0 ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0 ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x34\x00\x01\x00\x01\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x00\x01\x00\x01\x00\x00\x00\x3c\x04\x80\x00\x34\x00\x00\x00\x34\x00\x01\x09\x00\x00\x00\x00\x00\x00\x81\x03\xe9\x43\x30\x30\x30"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=4172, duration=47, info=0}) = 0 ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0 ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x31\x00\x00\x00\x01\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x00\x00\x00\x01\x00\x00\x00\x3c\x01\x80\x00\x34\x00\x00\x00\x34\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=4172, duration=37, info=0}) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 write(1, "Unloading drive 0 into Storage E"..., 43) = 43 ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_TO_DEV, cmd_len=12, cmdp="\xa5\x00\x00\x00\x00\x01\x03\xe9\x00\x00\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=0, timeout=300000, flags=0, dxferp=NULL, status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=0, info=0}) = -1 EINVAL (Invalid argument) write(2, "mtx: Request Sense: Long Report="..., 36) = 36 And here's what I'm testing on top of 4.12: diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 82c33a6..6870e88 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -753,15 +753,22 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf, static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) { + + pr_info("%s: dxfer_direction: %d, dxfer_len: %d\n", + __func__, hp->dxfer_direction, hp->dxfer_len); + switch (hp->dxfer_direction) { case SG_DXFER_NONE: if (hp->dxferp || hp->dxfer_len > 0) return false; return true; - case SG_DXFER_TO_DEV: case SG_DXFER_FROM_DEV: + if (hp->dxfer_len < 0) + return false; + return true; + case SG_DXFER_TO_DEV: case SG_DXFER_TO_FROM_DEV: - if (!hp->dxferp || hp->dxfer_len == 0) + if (!hp->dxferp) return false; return true; case SG_DXFER_UNKNOWN: - J< ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-25 18:25 ` Jason L Tibbitts III @ 2017-07-26 7:39 ` Johannes Thumshirn 2017-07-26 20:19 ` Jason L Tibbitts III 0 siblings, 1 reply; 15+ messages in thread From: Johannes Thumshirn @ 2017-07-26 7:39 UTC (permalink / raw) To: Jason L Tibbitts III Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen On Tue, Jul 25, 2017 at 01:25:07PM -0500, Jason L Tibbitts III wrote: > >>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes: > ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_TO_DEV, cmd_len=12, cmdp="\xa5\x00\x00\x00\x00\x01\x03\xe9\x00\x00\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=0, timeout=300000, flags=0, dxferp=NULL, status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=0, info=0}) = -1 EINVAL (Invalid argument) [...] case SG_DXFER_TO_DEV: case SG_DXFER_TO_FROM_DEV: if (!hp->dxferp) return false; return true; So here we go, dxfer_direction is SG_DXFER_TO_DEV, dxfer_len is 0 and dxferp is NULL. We bail out because dxferp is NULL here. I only looked at sg_write() and not sg_new_write() which is called by the SG_IO path not the fops->write() path. It's probably best to just check for dxfer_len <= 2^28 to be valid as Doug suggested: diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 4fe606b000b4..d7ff71e0c85c 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -751,35 +751,6 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf, return count; } -static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) -{ - switch (hp->dxfer_direction) { - case SG_DXFER_NONE: - if (hp->dxferp || hp->dxfer_len > 0) - return false; - return true; - case SG_DXFER_FROM_DEV: - /* - * for SG_DXFER_FROM_DEV we always set dxfer_len to > 0. dxferp - * can either be NULL or != NULL so there's no point in checking - * it either. So just return true. - */ - return true; - case SG_DXFER_TO_DEV: - case SG_DXFER_TO_FROM_DEV: - if (!hp->dxferp || hp->dxfer_len == 0) - return false; - return true; - case SG_DXFER_UNKNOWN: - if ((!hp->dxferp && hp->dxfer_len) || - (hp->dxferp && hp->dxfer_len == 0)) - return false; - return true; - default: - return false; - } -} - static int sg_common_write(Sg_fd * sfp, Sg_request * srp, unsigned char *cmnd, int timeout, int blocking) @@ -800,7 +771,7 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp, "sg_common_write: scsi opcode=0x%02x, cmd_size=%d\n", (int) cmnd[0], (int) hp->cmd_len)); - if (!sg_is_valid_dxfer(hp)) + if (hp->dxfer_len >= SZ_256M) return -EINVAL; k = sg_start_req(srp, cmnd); -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-26 7:39 ` Johannes Thumshirn @ 2017-07-26 20:19 ` Jason L Tibbitts III 0 siblings, 0 replies; 15+ messages in thread From: Jason L Tibbitts III @ 2017-07-26 20:19 UTC (permalink / raw) To: Johannes Thumshirn Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen >>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes: JT> It's probably best to just check for dxfer_len <= 2^28 to be valid JT> as Doug suggested: I can verify that patch on top of git head (as of a few hours ago) does function properly. It didn't apply directly on top of 4.12 but even I can handle fixing that up. The result (just deleting the function and changing the call to a check for hp->dxfer_len >= SZ_256M) works fine and is at the end. So thanks. If this goes in, please CC to stable. - J< diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 82c33a6..aa6f1de 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -751,29 +751,6 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf, return count; } -static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) -{ - switch (hp->dxfer_direction) { - case SG_DXFER_NONE: - if (hp->dxferp || hp->dxfer_len > 0) - return false; - return true; - case SG_DXFER_TO_DEV: - case SG_DXFER_FROM_DEV: - case SG_DXFER_TO_FROM_DEV: - if (!hp->dxferp || hp->dxfer_len == 0) - return false; - return true; - case SG_DXFER_UNKNOWN: - if ((!hp->dxferp && hp->dxfer_len) || - (hp->dxferp && hp->dxfer_len == 0)) - return false; - return true; - default: - return false; - } -} - static int sg_common_write(Sg_fd * sfp, Sg_request * srp, unsigned char *cmnd, int timeout, int blocking) @@ -794,7 +771,7 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp, "sg_common_write: scsi opcode=0x%02x, cmd_size=%d\n", (int) cmnd[0], (int) hp->cmd_len)); - if (!sg_is_valid_dxfer(hp)) + if (hp->dxfer_len >= SZ_256M) return -EINVAL; k = sg_start_req(srp, cmnd); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-19 8:36 ` Johannes Thumshirn 2017-07-21 16:43 ` Johannes Thumshirn @ 2017-07-25 19:09 ` Douglas Gilbert 2017-07-26 7:22 ` Johannes Thumshirn 1 sibling, 1 reply; 15+ messages in thread From: Douglas Gilbert @ 2017-07-25 19:09 UTC (permalink / raw) To: Johannes Thumshirn, Jason L Tibbitts III Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen On 2017-07-19 04:36 AM, Johannes Thumshirn wrote: > On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote: >> [ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0 > > Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in > your case. > > This would "fix" it but I'm not generally sure it is _the_ solution: > > diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c > index 1e82d4128a84..b421ec81d775 100644 > --- a/drivers/scsi/sg.c > +++ b/drivers/scsi/sg.c > @@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) > return true; > case SG_DXFER_TO_DEV: > case SG_DXFER_TO_FROM_DEV: > - if (!hp->dxferp || hp->dxfer_len == 0) > + if (!hp->dxferp) > return false; > return true; > case SG_DXFER_UNKNOWN: > > Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't > be sure len > 0, can we rely on dxferp being present? _TO_ is toward the device (i.e. WRITE) and what T10 call "data-out". _FROM_ is from the device (e.g. INQUIRY) and what T10 call "data-in". _TO_FROM_ is basically _FROM_ (and has nothing to do with bidi). _TO_FROM_ is very old and is meant to try and detect "short reads" by prefilling the indirect buffer (by reading from dxferp) before it is overwritten by the _FROM_ (and then writing to dxferp). It is from the time when not all LLDs or HBAs provided an indication of a "short read". Today users have the 'sg_io_hdr_t::resid' for that purpose. Whether the sg driver in lk 4.12 still does that I haven't checked. The only limit that should be placed on dxfer_len is something like <= 2**28 (256M) in my opinion. Big enough that the kernel would reject it and small enough to catch negative values placed in an unsigned. The overall problem is that sg_is_valid_dxfer() introduced in lk 4.12 is doing more sanity checks than were done before. My policy was to ignore ("don't care") combinations of dxfer_direction, dxferp and dxfer_len that were harmless. Anyway those three variables are incomplete since the SCSI command and the device also dictate the length of the data-in transfer. Doug Gilbert ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control 2017-07-25 19:09 ` Douglas Gilbert @ 2017-07-26 7:22 ` Johannes Thumshirn 0 siblings, 0 replies; 15+ messages in thread From: Johannes Thumshirn @ 2017-07-26 7:22 UTC (permalink / raw) To: Douglas Gilbert Cc: Jason L Tibbitts III, linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen On Tue, Jul 25, 2017 at 03:09:00PM -0400, Douglas Gilbert wrote: > _TO_ is toward the device (i.e. WRITE) and what T10 call "data-out". > _FROM_ is from the device (e.g. INQUIRY) and what T10 call "data-in". > > _TO_FROM_ is basically _FROM_ (and has nothing to do with bidi). > _TO_FROM_ is very old and is meant to try and detect "short reads" > by prefilling the indirect buffer (by reading from dxferp) before > it is overwritten by the _FROM_ (and then writing to dxferp). It > is from the time when not all LLDs or HBAs provided an indication > of a "short read". Today users have the 'sg_io_hdr_t::resid' for > that purpose. Whether the sg driver in lk 4.12 still does that > I haven't checked. > > The only limit that should be placed on dxfer_len is something like > <= 2**28 (256M) in my opinion. Big enough that the kernel would > reject it and small enough to catch negative values placed in an > unsigned. OK. I'll give it a shot, thanks. > > > The overall problem is that sg_is_valid_dxfer() introduced in lk 4.12 > is doing more sanity checks than were done before. My policy was > to ignore ("don't care") combinations of dxfer_direction, dxferp and > dxfer_len that were harmless. Anyway those three variables are > incomplete since the SCSI command and the device also dictate the > length of the data-in transfer. The problem with these don't care combinations was, that user-space could then easily crash the kernel. This is the reason I introduced sg_is_valid_dxfer(). It's sole purpuse was to avoid more CVEs, but unfortunately it turned into quite some regressions. Thanks, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-07-26 20:19 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-07-18 0:44 [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control Jason L Tibbitts III 2017-07-18 7:37 ` Johannes Thumshirn 2017-07-18 16:53 ` Jason L Tibbitts III 2017-07-18 17:33 ` Jason L Tibbitts III 2017-07-19 7:03 ` Johannes Thumshirn 2017-07-19 8:13 ` Jason L Tibbitts III 2017-07-19 8:36 ` Johannes Thumshirn 2017-07-21 16:43 ` Johannes Thumshirn 2017-07-21 19:23 ` Jason L Tibbitts III 2017-07-25 7:18 ` Johannes Thumshirn 2017-07-25 18:25 ` Jason L Tibbitts III 2017-07-26 7:39 ` Johannes Thumshirn 2017-07-26 20:19 ` Jason L Tibbitts III 2017-07-25 19:09 ` Douglas Gilbert 2017-07-26 7:22 ` Johannes Thumshirn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox