linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] megaraid: add scsi_cmnd NULL check before use
@ 2016-05-08  0:40 Petros Koutoupis
  2016-05-08  3:32 ` Finn Thain
  0 siblings, 1 reply; 20+ messages in thread
From: Petros Koutoupis @ 2016-05-08  0:40 UTC (permalink / raw)
  To: kashyap.desai, sumit.saxena, uday.lingala; +Cc: megaraidlinux.pdl, linux-scsi

The current state of the code checks to see if the reference to scsi_cmnd is
not null, but it never checks to see if it is null and always assumes it is valid
before its use in below switch statement. This patch addresses that.

--- linux/drivers/scsi/megaraid/megaraid_sas_fusion.c.orig	2016-05-07 09:12:56.748969851 -0500
+++ linux/drivers/scsi/megaraid/megaraid_sas_fusion.c	2016-05-07 09:15:29.612967113 -0500
@@ -2277,6 +2277,10 @@ complete_cmd_fusion(struct megasas_insta
 
 		if (cmd_fusion->scmd)
 			cmd_fusion->scmd->SCp.ptr = NULL;
+		else if ((!cmd_fusion->scmd) &&
+			 ((scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
+			 (scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)))
+			goto next;
 
 		scmd_local = cmd_fusion->scmd;
 		status = scsi_io_req->RaidContext.status;
@@ -2336,7 +2340,7 @@ complete_cmd_fusion(struct megasas_insta
 				megasas_complete_cmd(instance, cmd_mfi, DID_OK);
 			break;
 		}
-
+next:
 		fusion->last_reply_idx[MSIxIndex]++;
 		if (fusion->last_reply_idx[MSIxIndex] >=
 		    fusion->reply_q_depth)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-08  0:40 [PATCH] megaraid: add scsi_cmnd NULL check before use Petros Koutoupis
@ 2016-05-08  3:32 ` Finn Thain
  2016-05-08 12:08   ` Petros Koutoupis
  2016-05-09  8:05   ` Dan Carpenter
  0 siblings, 2 replies; 20+ messages in thread
From: Finn Thain @ 2016-05-08  3:32 UTC (permalink / raw)
  To: Petros Koutoupis
  Cc: kashyap.desai, sumit.saxena, uday.lingala, megaraidlinux.pdl,
	linux-scsi, Dan Carpenter


On Sat, 7 May 2016, Petros Koutoupis wrote:

> The current state of the code checks to see if the reference to 
> scsi_cmnd is not null, but it never checks to see if it is null and 
> always assumes it is valid before its use in below switch statement. 
> This patch addresses that.
> 

There is no sign-off here.

> --- linux/drivers/scsi/megaraid/megaraid_sas_fusion.c.orig	2016-05-07 09:12:56.748969851 -0500
> +++ linux/drivers/scsi/megaraid/megaraid_sas_fusion.c	2016-05-07 09:15:29.612967113 -0500
> @@ -2277,6 +2277,10 @@ complete_cmd_fusion(struct megasas_insta
>  
>  		if (cmd_fusion->scmd)
>  			cmd_fusion->scmd->SCp.ptr = NULL;
> +		else if ((!cmd_fusion->scmd) &&
> +			 ((scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
> +			 (scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)))
> +			goto next;

That contains a tautology.

>  
>  		scmd_local = cmd_fusion->scmd;
>  		status = scsi_io_req->RaidContext.status;
> @@ -2336,7 +2340,7 @@ complete_cmd_fusion(struct megasas_insta
>  				megasas_complete_cmd(instance, cmd_mfi, DID_OK);
>  			break;
>  		}
> -
> +next:
>  		fusion->last_reply_idx[MSIxIndex]++;
>  		if (fusion->last_reply_idx[MSIxIndex] >=
>  		    fusion->reply_q_depth)
> 
> 

Your patch is equivalent to this one:

@@ -2294,6 +2294,8 @@
 			complete(&cmd_fusion->done);
 			break;
 		case MPI2_FUNCTION_SCSI_IO_REQUEST:  /*Fast Path IO.*/
+			if (unlikely(!scmd_local))
+				break;
 			/* Update load balancing info */
 			device_id = MEGASAS_DEV_INDEX(scmd_local);
 			lbinfo = &fusion->load_balance_info[device_id];
@@ -2311,6 +2313,8 @@
 			}
 			/* Fall thru and complete IO */
 		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO Path */
+			if (unlikely(!scmd_local))
+				break;
 			/* Map the FW Cmd Status */
 			map_cmd_status(cmd_fusion, status, extStatus);
 			scsi_io_req->RaidContext.status = 0;


I don't know anything about this driver or hardware so I'm not actually 
advocating any of these changes, just pointing out that your patch has 
issues.

I would have expected the 'smatch' checker to detect either the potential 
NULL pointer derefs or alternatively the redundant test for a non-NULL 
pointer. Maybe it has detected this already (?) or maybe the NULL pointer 
deref can be shown to be impossible?

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-08  3:32 ` Finn Thain
@ 2016-05-08 12:08   ` Petros Koutoupis
  2016-05-08 12:22     ` Finn Thain
  2016-05-09  8:05   ` Dan Carpenter
  1 sibling, 1 reply; 20+ messages in thread
From: Petros Koutoupis @ 2016-05-08 12:08 UTC (permalink / raw)
  To: Finn Thain
  Cc: kashyap.desai, sumit.saxena, uday.lingala, megaraidlinux.pdl,
	linux-scsi, Dan Carpenter, petros@petroskoutoupis.com

On Sun, 2016-05-08 at 13:32 +1000, Finn Thain wrote:
> On Sat, 7 May 2016, Petros Koutoupis wrote:
> 
> > The current state of the code checks to see if the reference to 
> > scsi_cmnd is not null, but it never checks to see if it is null and 
> > always assumes it is valid before its use in below switch statement. 
> > This patch addresses that.
> > 
> 
> There is no sign-off here.
> 

Oops. Will resend.

> > --- linux/drivers/scsi/megaraid/megaraid_sas_fusion.c.orig	2016-05-07 09:12:56.748969851 -0500
> > +++ linux/drivers/scsi/megaraid/megaraid_sas_fusion.c	2016-05-07 09:15:29.612967113 -0500
> > @@ -2277,6 +2277,10 @@ complete_cmd_fusion(struct megasas_insta
> >  
> >  		if (cmd_fusion->scmd)
> >  			cmd_fusion->scmd->SCp.ptr = NULL;
> > +		else if ((!cmd_fusion->scmd) &&
> > +			 ((scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
> > +			 (scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)))
> > +			goto next;
> 
> That contains a tautology.
> 

How so? The very definition of tautology implies that I am repeating
the same kind of functionality in two different ways. The thing is,
this function routine does not always need to have a valid scsi_cmnd
reference, which is why in the first it checks if it is valid. In
the second, it only checks if it is null under the only two cases
which make use of it.

> >  
> >  		scmd_local = cmd_fusion->scmd;
> >  		status = scsi_io_req->RaidContext.status;
> > @@ -2336,7 +2340,7 @@ complete_cmd_fusion(struct megasas_insta
> >  				megasas_complete_cmd(instance, cmd_mfi, DID_OK);
> >  			break;
> >  		}
> > -
> > +next:
> >  		fusion->last_reply_idx[MSIxIndex]++;
> >  		if (fusion->last_reply_idx[MSIxIndex] >=
> >  		    fusion->reply_q_depth)
> > 
> > 
> 
> Your patch is equivalent to this one:
> 
> @@ -2294,6 +2294,8 @@
>  			complete(&cmd_fusion->done);
>  			break;
>  		case MPI2_FUNCTION_SCSI_IO_REQUEST:  /*Fast Path IO.*/
> +			if (unlikely(!scmd_local))
> +				break;
>  			/* Update load balancing info */
>  			device_id = MEGASAS_DEV_INDEX(scmd_local);
>  			lbinfo = &fusion->load_balance_info[device_id];
> @@ -2311,6 +2313,8 @@
>  			}
>  			/* Fall thru and complete IO */
>  		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO Path */
> +			if (unlikely(!scmd_local))
> +				break;
>  			/* Map the FW Cmd Status */
>  			map_cmd_status(cmd_fusion, status, extStatus);
>  			scsi_io_req->RaidContext.status = 0;
> 
> 

Yeah, except your patch introduces duplication of the same code and
even worse, if the command starts in the MPI2_FUNCTION_SCSI_IO_REQUEST
case, it will check the status of the pointer twice which seems pretty
inefficient for systems running such high loads (i.e. the "Fall
thru").


> I don't know anything about this driver or hardware so I'm not actually 
> advocating any of these changes, just pointing out that your patch has 
> issues.
> 
> I would have expected the 'smatch' checker to detect either the potential 
> NULL pointer derefs or alternatively the redundant test for a non-NULL 
> pointer. Maybe it has detected this already (?) or maybe the NULL pointer 
> deref can be shown to be impossible?
> 

It is very possible. We have had customers in the field hit pretty
hard by this.

> -- 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-08 12:08   ` Petros Koutoupis
@ 2016-05-08 12:22     ` Finn Thain
  2016-05-08 16:34       ` Petros Koutoupis
  0 siblings, 1 reply; 20+ messages in thread
From: Finn Thain @ 2016-05-08 12:22 UTC (permalink / raw)
  To: Petros Koutoupis
  Cc: kashyap.desai, sumit.saxena, uday.lingala, megaraidlinux.pdl,
	linux-scsi, Dan Carpenter


On Sun, 8 May 2016, Petros Koutoupis wrote:

> > 
> > That contains a tautology.
> > 
> 
> How so?

if (x)
	/* ... */
else if (!x && (whatever))
	/* ... */

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-08 12:22     ` Finn Thain
@ 2016-05-08 16:34       ` Petros Koutoupis
  2016-05-09  1:35         ` Julian Calaby
  0 siblings, 1 reply; 20+ messages in thread
From: Petros Koutoupis @ 2016-05-08 16:34 UTC (permalink / raw)
  To: Finn Thain
  Cc: kashyap.desai, sumit.saxena, uday.lingala, megaraidlinux.pdl,
	linux-scsi, Dan Carpenter

On Sun, 2016-05-08 at 22:22 +1000, Finn Thain wrote:
> On Sun, 8 May 2016, Petros Koutoupis wrote:
> 
> > > 
> > > That contains a tautology.
> > > 
> > 
> > How so?
> 
> if (x)
> 	/* ... */
> else if (!x && (whatever))
> 	/* ... */
> 
> -- 

Thank you but I know the logic of what I wrote. A tautology
will yield the same results no matter what the interpretation.
That is not a tautology. The two conditionals in my case check
different states and serve different purposes.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-08 16:34       ` Petros Koutoupis
@ 2016-05-09  1:35         ` Julian Calaby
  2016-05-09  2:59           ` Petros Koutoupis
  0 siblings, 1 reply; 20+ messages in thread
From: Julian Calaby @ 2016-05-09  1:35 UTC (permalink / raw)
  To: Petros Koutoupis
  Cc: Finn Thain, kashyap.desai, sumit.saxena, uday.lingala,
	megaraidlinux.pdl, linux-scsi, Dan Carpenter

Hi Petros,

On Mon, May 9, 2016 at 2:34 AM, Petros Koutoupis
<petros@petroskoutoupis.com> wrote:
> On Sun, 2016-05-08 at 22:22 +1000, Finn Thain wrote:
>> On Sun, 8 May 2016, Petros Koutoupis wrote:
>>
>> > >
>> > > That contains a tautology.
>> > >
>> >
>> > How so?
>>
>> if (x)
>>       /* ... */
>> else if (!x && (whatever))
>>       /* ... */
>>
>> --
>
> Thank you but I know the logic of what I wrote. A tautology
> will yield the same results no matter what the interpretation.
> That is not a tautology. The two conditionals in my case check
> different states and serve different purposes.

You're missing the point.

Execution will only reach the else branch if "!cmd_fusion->scmd",
hence checking that is unnecessary. Removing that test (and all the
unnecessary parentheses) will reduce the second test to:

else if (scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
        scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)

which is much cleaner.

Thanks,

-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-09  1:35         ` Julian Calaby
@ 2016-05-09  2:59           ` Petros Koutoupis
  0 siblings, 0 replies; 20+ messages in thread
From: Petros Koutoupis @ 2016-05-09  2:59 UTC (permalink / raw)
  To: Julian Calaby
  Cc: Finn Thain, kashyap.desai, sumit.saxena, uday.lingala,
	megaraidlinux.pdl, linux-scsi, Dan Carpenter

On Mon, 2016-05-09 at 11:35 +1000, Julian Calaby wrote:
> Hi Petros,
> 
> On Mon, May 9, 2016 at 2:34 AM, Petros Koutoupis
> <petros@petroskoutoupis.com> wrote:
> > On Sun, 2016-05-08 at 22:22 +1000, Finn Thain wrote:
> >> On Sun, 8 May 2016, Petros Koutoupis wrote:
> >>
> >> > >
> >> > > That contains a tautology.
> >> > >
> >> >
> >> > How so?
> >>
> >> if (x)
> >>       /* ... */
> >> else if (!x && (whatever))
> >>       /* ... */
> >>
> >> --
> >
> > Thank you but I know the logic of what I wrote. A tautology
> > will yield the same results no matter what the interpretation.
> > That is not a tautology. The two conditionals in my case check
> > different states and serve different purposes.
> 
> You're missing the point.
> 
> Execution will only reach the else branch if "!cmd_fusion->scmd",
> hence checking that is unnecessary. Removing that test (and all the
> unnecessary parentheses) will reduce the second test to:
> 
> else if (scsi_io_req->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
>         scsi_io_req->Function == MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST)
> 
> which is much cleaner.
> 
> Thanks,
> 

Julian,

I agree. I will clean up and resend. Thank you.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-08  3:32 ` Finn Thain
  2016-05-08 12:08   ` Petros Koutoupis
@ 2016-05-09  8:05   ` Dan Carpenter
  2016-05-09  9:43     ` Dan Carpenter
  2016-05-09  9:48     ` Sumit Saxena
  1 sibling, 2 replies; 20+ messages in thread
From: Dan Carpenter @ 2016-05-09  8:05 UTC (permalink / raw)
  To: Finn Thain
  Cc: Petros Koutoupis, kashyap.desai, sumit.saxena, uday.lingala,
	megaraidlinux.pdl, linux-scsi

Smatch doesn't quite catch it because we check "cmd_fusion->scmd" for
NULL then assign "scmd_local = cmd_fusion->scmd;" and dereference
scmd_local unconditionally...

It does catch part of the bug if you have cross function analysis:

  drivers/scsi/megaraid/megaraid_sas_fusion.c:2318 complete_cmd_fusion()
  error: we previously assumed 'cmd_fusion->scmd' could be null (see line 2281)

But that code was from 2010 so I never reported it to the original
author or the list.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-09  8:05   ` Dan Carpenter
@ 2016-05-09  9:43     ` Dan Carpenter
  2016-05-09  9:48     ` Sumit Saxena
  1 sibling, 0 replies; 20+ messages in thread
From: Dan Carpenter @ 2016-05-09  9:43 UTC (permalink / raw)
  To: Finn Thain
  Cc: Petros Koutoupis, kashyap.desai, sumit.saxena, uday.lingala,
	megaraidlinux.pdl, linux-scsi

I've updated Smatch to catch these and I'm testing it now.  We'll see
how it goes.  If my quick and dirty method doesn't has too many false
positives then it's will take some months before I get around to doing
it properly...

Thanks for notifying me on this.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-09  8:05   ` Dan Carpenter
  2016-05-09  9:43     ` Dan Carpenter
@ 2016-05-09  9:48     ` Sumit Saxena
  2016-05-09 19:29       ` Dan Carpenter
  2016-05-09 21:28       ` Petros Koutoupis
  1 sibling, 2 replies; 20+ messages in thread
From: Sumit Saxena @ 2016-05-09  9:48 UTC (permalink / raw)
  To: Dan Carpenter, Finn Thain
  Cc: Petros Koutoupis, kashyap.desai, sumit.saxena, uday.lingala,
	megaraidlinux.pdl, linux-scsi

> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Monday, May 09, 2016 1:36 PM
> To: Finn Thain
> Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
> Smatch doesn't quite catch it because we check "cmd_fusion->scmd" for
NULL
> then assign "scmd_local = cmd_fusion->scmd;" and dereference scmd_local
> unconditionally...
>
> It does catch part of the bug if you have cross function analysis:
>
>   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318 complete_cmd_fusion()
>   error: we previously assumed 'cmd_fusion->scmd' could be null (see
line 2281)
>
> But that code was from 2010 so I never reported it to the original
author or the
> list.
"cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set to
MPI2_FUNCTION_SCSI_IO_REQUEST (OR) MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
(inside these two cases only, cmd_fusion->scmd will be dereferenced). If
cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that will a
BUG and
should not continue with other commands processing in that case.

>
> regards,
> dan carpenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-09  9:48     ` Sumit Saxena
@ 2016-05-09 19:29       ` Dan Carpenter
  2016-05-09 21:28       ` Petros Koutoupis
  1 sibling, 0 replies; 20+ messages in thread
From: Dan Carpenter @ 2016-05-09 19:29 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: Finn Thain, Petros Koutoupis, kashyap.desai, sumit.saxena,
	uday.lingala, megaraidlinux.pdl, linux-scsi

I'm confused what you are saying.  According to Petros, these bugs are
causing failures in real life, I was only added to the CC list because
Smatch should have warned about them.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-09  9:48     ` Sumit Saxena
  2016-05-09 19:29       ` Dan Carpenter
@ 2016-05-09 21:28       ` Petros Koutoupis
  2016-05-11  9:41         ` Sumit Saxena
  1 sibling, 1 reply; 20+ messages in thread
From: Petros Koutoupis @ 2016-05-09 21:28 UTC (permalink / raw)
  To: Sumit Saxena, Dan Carpenter, Finn Thain
  Cc: kashyap.desai, sumit.saxena, uday.lingala, megaraidlinux.pdl,
	linux-scsi

On Mon, 2016-05-09 at 15:18 +0530, Sumit Saxena wrote:
> > 
> > -----Original Message-----
> > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > Sent: Monday, May 09, 2016 1:36 PM
> > To: Finn Thain
> > Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> > sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> > megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> > 
> > Smatch doesn't quite catch it because we check "cmd_fusion->scmd" for
> NULL
> > 
> > then assign "scmd_local = cmd_fusion->scmd;" and dereference scmd_local
> > unconditionally...
> > 
> > It does catch part of the bug if you have cross function analysis:
> > 
> >   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318 complete_cmd_fusion()
> >   error: we previously assumed 'cmd_fusion->scmd' could be null (see
> line 2281)
> > 
> > 
> > But that code was from 2010 so I never reported it to the original
> author or the
> > 
> > list.
> "cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set to
> MPI2_FUNCTION_SCSI_IO_REQUEST (OR) MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
> (inside these two cases only, cmd_fusion->scmd will be dereferenced). If
> cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that will a
> BUG and
> should not continue with other commands processing in that case.
> 

Sumit,

To clarify, a detection of cmd_fusion->scmd being NULL with
scsi_io_req->Function set to MPI2_FUNCTION_SCSI_IO_REQUEST or
MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST should instead trigger a
BUG() and not attempt to iterate to the next command in the
list. Thank you.

--
Petros
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-09 21:28       ` Petros Koutoupis
@ 2016-05-11  9:41         ` Sumit Saxena
  2016-05-12  1:49           ` Petros Koutoupis
  0 siblings, 1 reply; 20+ messages in thread
From: Sumit Saxena @ 2016-05-11  9:41 UTC (permalink / raw)
  To: Petros Koutoupis, Dan Carpenter, Finn Thain
  Cc: kashyap.desai, sumit.saxena, uday.lingala, megaraidlinux.pdl,
	linux-scsi

> -----Original Message-----
> From: Petros Koutoupis [mailto:petros@petroskoutoupis.com]
> Sent: Tuesday, May 10, 2016 2:59 AM
> To: Sumit Saxena; Dan Carpenter; Finn Thain
> Cc: kashyap.desai@avagotech.com; sumit.saxena@avagotech.com;
> uday.lingala@avagotech.com; megaraidlinux.pdl@avagotech.com; linux-
> scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
> On Mon, 2016-05-09 at 15:18 +0530, Sumit Saxena wrote:
> > >
> > > -----Original Message-----
> > > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > > Sent: Monday, May 09, 2016 1:36 PM
> > > To: Finn Thain
> > > Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> > > sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> > > megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> > > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> > >
> > > Smatch doesn't quite catch it because we check "cmd_fusion->scmd"
> > > for
> > NULL
> > >
> > > then assign "scmd_local = cmd_fusion->scmd;" and dereference
> > > scmd_local unconditionally...
> > >
> > > It does catch part of the bug if you have cross function analysis:
> > >
> > >   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318
> > > complete_cmd_fusion()
> > >   error: we previously assumed 'cmd_fusion->scmd' could be null (see
> > line 2281)
> > >
> > >
> > > But that code was from 2010 so I never reported it to the original
> > author or the
> > >
> > > list.
> > "cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set
> > to MPI2_FUNCTION_SCSI_IO_REQUEST (OR)
> > MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
> > (inside these two cases only, cmd_fusion->scmd will be dereferenced).
> > If cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that
> > will a BUG and should not continue with other commands processing in
> > that case.
> >
>
> Sumit,
>
> To clarify, a detection of cmd_fusion->scmd being NULL with scsi_io_req-
> >Function set to MPI2_FUNCTION_SCSI_IO_REQUEST or
> MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST should instead trigger a
> BUG() and not attempt to iterate to the next command in the list. Thank
> you.

Petros,

WARN_ON() can be used in this case. Upstream may have concerns on using
BUG_ON() and also BUG_ON() won't help in this case. In production
environment we never encountered this.

Thanks,
Sumit
>
> --
> Petros

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-11  9:41         ` Sumit Saxena
@ 2016-05-12  1:49           ` Petros Koutoupis
  2016-05-12  6:35             ` Dan Carpenter
  0 siblings, 1 reply; 20+ messages in thread
From: Petros Koutoupis @ 2016-05-12  1:49 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: Dan Carpenter, Finn Thain, kashyap.desai, sumit.saxena,
	uday.lingala, megaraidlinux.pdl, linux-scsi

On Wed, 2016-05-11 at 15:11 +0530, Sumit Saxena wrote:
> > -----Original Message-----
> > From: Petros Koutoupis [mailto:petros@petroskoutoupis.com]
> > Sent: Tuesday, May 10, 2016 2:59 AM
> > To: Sumit Saxena; Dan Carpenter; Finn Thain
> > Cc: kashyap.desai@avagotech.com; sumit.saxena@avagotech.com;
> > uday.lingala@avagotech.com; megaraidlinux.pdl@avagotech.com; linux-
> > scsi@vger.kernel.org
> > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> >
> > On Mon, 2016-05-09 at 15:18 +0530, Sumit Saxena wrote:
> > > >
> > > > -----Original Message-----
> > > > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > > > Sent: Monday, May 09, 2016 1:36 PM
> > > > To: Finn Thain
> > > > Cc: Petros Koutoupis; kashyap.desai@avagotech.com;
> > > > sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> > > > megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> > > > Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
> > > >
> > > > Smatch doesn't quite catch it because we check "cmd_fusion->scmd"
> > > > for
> > > NULL
> > > >
> > > > then assign "scmd_local = cmd_fusion->scmd;" and dereference
> > > > scmd_local unconditionally...
> > > >
> > > > It does catch part of the bug if you have cross function analysis:
> > > >
> > > >   drivers/scsi/megaraid/megaraid_sas_fusion.c:2318
> > > > complete_cmd_fusion()
> > > >   error: we previously assumed 'cmd_fusion->scmd' could be null (see
> > > line 2281)
> > > >
> > > >
> > > > But that code was from 2010 so I never reported it to the original
> > > author or the
> > > >
> > > > list.
> > > "cmd_fusion->scmd" should not be NULL if scsi_io_req->Function is set
> > > to MPI2_FUNCTION_SCSI_IO_REQUEST (OR)
> > > MEGASAS_MPI2_FUNCTION_LD_IO_REQUES
> > > (inside these two cases only, cmd_fusion->scmd will be dereferenced).
> > > If cmd_fusion->scmd is NULL for these "scsi_io_req->Function", that
> > > will a BUG and should not continue with other commands processing in
> > > that case.
> > >
> >
> > Sumit,
> >
> > To clarify, a detection of cmd_fusion->scmd being NULL with scsi_io_req-
> > >Function set to MPI2_FUNCTION_SCSI_IO_REQUEST or
> > MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST should instead trigger a
> > BUG() and not attempt to iterate to the next command in the list. Thank
> > you.
> 
> Petros,
> 
> WARN_ON() can be used in this case. Upstream may have concerns on using
> BUG_ON() and also BUG_ON() won't help in this case. In production
> environment we never encountered this.
> 
> Thanks,
> Sumit
> >
> > --
> > Petros

Sumit,

I will resubmit the patch with all the recommendations. Thank you. In case
you are interested, I have a crash file showcasing the error. I can always
provide this outside of this mailing thread.

--
Petros



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-12  1:49           ` Petros Koutoupis
@ 2016-05-12  6:35             ` Dan Carpenter
  2016-05-12 12:35               ` Sumit Saxena
  2016-05-13 23:34               ` Petros Koutoupis
  0 siblings, 2 replies; 20+ messages in thread
From: Dan Carpenter @ 2016-05-12  6:35 UTC (permalink / raw)
  To: Petros Koutoupis
  Cc: Sumit Saxena, Finn Thain, kashyap.desai, sumit.saxena,
	uday.lingala, megaraidlinux.pdl, linux-scsi

On Wed, May 11, 2016 at 08:49:51PM -0500, Petros Koutoupis wrote:
> Sumit,
> 
> I will resubmit the patch with all the recommendations. Thank you. In case
> you are interested, I have a crash file showcasing the error. I can always
> provide this outside of this mailing thread.
> 

Please send it to the thread.

To be honest, I totally can't understand this thread.  Sumit says it is
impossible and you are saying that you have seen it happen in real life.
Are you using out of tree code or something?  What are you doing that is
unexpected?

I don't see the point of adding a WARN_ON().  NULL derefs normally
generate a pretty clear stack trace already (unless they are caused by
memory corruption).  Why are we not just fixing the bugs instead of
warning and then crashing.

Also when I'm doing static analysis people always tell me that "that bug
is impossible, trust me." and instead of trusting people I really wish
they would just show me the relevant code that prevents it from
happening.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-12  6:35             ` Dan Carpenter
@ 2016-05-12 12:35               ` Sumit Saxena
  2016-05-13  7:43                 ` Finn Thain
  2016-05-13 23:34               ` Petros Koutoupis
  1 sibling, 1 reply; 20+ messages in thread
From: Sumit Saxena @ 2016-05-12 12:35 UTC (permalink / raw)
  To: Dan Carpenter, Petros Koutoupis
  Cc: Finn Thain, kashyap.desai, sumit.saxena, uday.lingala,
	megaraidlinux.pdl, linux-scsi

> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Thursday, May 12, 2016 12:05 PM
> To: Petros Koutoupis
> Cc: Sumit Saxena; Finn Thain; kashyap.desai@avagotech.com;
> sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
> On Wed, May 11, 2016 at 08:49:51PM -0500, Petros Koutoupis wrote:
> > Sumit,
> >
> > I will resubmit the patch with all the recommendations. Thank you. In
> > case you are interested, I have a crash file showcasing the error. I
> > can always provide this outside of this mailing thread.
> >
>
> Please send it to the thread.
>
> To be honest, I totally can't understand this thread.  Sumit says it is
impossible
> and you are saying that you have seen it happen in real life.
> Are you using out of tree code or something?  What are you doing that is
> unexpected?
>
> I don't see the point of adding a WARN_ON().  NULL derefs normally
generate a
> pretty clear stack trace already (unless they are caused by memory
corruption).
> Why are we not just fixing the bugs instead of warning and then
crashing.
Agree, if there scsi_cmnd is coming as NULL, please attach logs. I will
look into them.
>
> Also when I'm doing static analysis people always tell me that "that bug
is
> impossible, trust me." and instead of trusting people I really wish they
would just
> show me the relevant code that prevents it from happening.
Inside megasas_build_io_fusion() function,  driver sets "cmd->scmd"
pointer(SCSI command pointer received from SCSI mid layer). Functions
called inside megasas_build_io_fusion()(which actually builds frame to be
sent to firmware)
are setting Function type- MPI2_FUNCTION_SCSI_IO_REQUEST (or)
MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST. So in case Function type set to any
one these two, there must be valid "cmd->scmd".

Thanks,
Sumit

>
> regards,
> dan carpenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-12 12:35               ` Sumit Saxena
@ 2016-05-13  7:43                 ` Finn Thain
  2016-05-13  7:43                   ` Sumit Saxena
  0 siblings, 1 reply; 20+ messages in thread
From: Finn Thain @ 2016-05-13  7:43 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: Dan Carpenter, Petros Koutoupis, kashyap.desai, sumit.saxena,
	uday.lingala, megaraidlinux.pdl, linux-scsi


On Thu, 12 May 2016, Sumit Saxena wrote:

> > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> >
> > Also when I'm doing static analysis people always tell me that "that 
> > bug is impossible, trust me." and instead of trusting people I really 
> > wish they would just show me the relevant code that prevents it from 
> > happening.
> 
> Inside megasas_build_io_fusion() function, driver sets "cmd->scmd" 
> pointer(SCSI command pointer received from SCSI mid layer). Functions 
> called inside megasas_build_io_fusion()(which actually builds frame to 
> be sent to firmware) are setting Function type- 
> MPI2_FUNCTION_SCSI_IO_REQUEST (or) MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST. 
> So in case Function type set to any one these two, there must be valid 
> "cmd->scmd".

That doesn't show what prevents the bug. It merely shows that the bug does 
not always manifest.

For example, you might check whether anything prevents 
megasas_build_io_fusion() from returning before assigning cmd->scmd, like 
so:

2112 	if (sge_count > instance->max_num_sge) {
2113 		dev_err(&instance->pdev->dev, "Error. sge_count (0x%x) exceeds "
2114 		       "max (0x%x) allowed\n", sge_count,
2115 		       instance->max_num_sge);
2116 		return 1;
2117 	}

Another possibility: cmd->io_request->Function is valid yet cmd->scmd is 
NULL when seen from the interrupt handler if it intervenes between the two 
statements in megasas_return_cmd_fusion():

180 inline void megasas_return_cmd_fusion(struct megasas_instance *instance,
181 	struct megasas_cmd_fusion *cmd)
182 {
183 	cmd->scmd = NULL;
184 	memset(cmd->io_request, 0, sizeof(struct MPI2_RAID_SCSI_IO_REQUEST));
185 }

You might want to confirm that locking always prevents that.

OTOH, without an actual backtrace, I too might be reluctant to pursue this 
kind of speculation.

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-13  7:43                 ` Finn Thain
@ 2016-05-13  7:43                   ` Sumit Saxena
  2016-05-13  9:25                     ` Finn Thain
  0 siblings, 1 reply; 20+ messages in thread
From: Sumit Saxena @ 2016-05-13  7:43 UTC (permalink / raw)
  To: Finn Thain
  Cc: Dan Carpenter, Petros Koutoupis, kashyap.desai, sumit.saxena,
	uday.lingala, megaraidlinux.pdl, linux-scsi

> -----Original Message-----
> From: Finn Thain [mailto:fthain@telegraphics.com.au]
> Sent: Friday, May 13, 2016 1:14 PM
> To: Sumit Saxena
> Cc: Dan Carpenter; Petros Koutoupis; kashyap.desai@avagotech.com;
> sumit.saxena@avagotech.com; uday.lingala@avagotech.com;
> megaraidlinux.pdl@avagotech.com; linux-scsi@vger.kernel.org
> Subject: RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
>
>
> On Thu, 12 May 2016, Sumit Saxena wrote:
>
> > > From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> > >
> > > Also when I'm doing static analysis people always tell me that "that
> > > bug is impossible, trust me." and instead of trusting people I
> > > really wish they would just show me the relevant code that prevents
> > > it from happening.
> >
> > Inside megasas_build_io_fusion() function, driver sets "cmd->scmd"
> > pointer(SCSI command pointer received from SCSI mid layer). Functions
> > called inside megasas_build_io_fusion()(which actually builds frame to
> > be sent to firmware) are setting Function type-
> > MPI2_FUNCTION_SCSI_IO_REQUEST (or)
> MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST.
> > So in case Function type set to any one these two, there must be valid
> > "cmd->scmd".
>
> That doesn't show what prevents the bug. It merely shows that the bug
does not
> always manifest.
>
> For example, you might check whether anything prevents
> megasas_build_io_fusion() from returning before assigning cmd->scmd,
like
> so:
>
> 2112 	if (sge_count > instance->max_num_sge) {
> 2113 		dev_err(&instance->pdev->dev, "Error. sge_count (0x%x)
> exceeds "
> 2114 		       "max (0x%x) allowed\n", sge_count,
> 2115 		       instance->max_num_sge);
> 2116 		return 1;
> 2117 	}
>
> Another possibility: cmd->io_request->Function is valid yet cmd->scmd is
NULL
> when seen from the interrupt handler if it intervenes between the two
> statements in megasas_return_cmd_fusion():

For IOs returned by above error are not actually fired to firmware so
there will be no interrupt handler called for the same.
Anyways, if anyone has logs pertaining to the failure, please attach..
>
> 180 inline void megasas_return_cmd_fusion(struct megasas_instance
*instance,
> 181 	struct megasas_cmd_fusion *cmd)
> 182 {
> 183 	cmd->scmd = NULL;
> 184 	memset(cmd->io_request, 0, sizeof(struct
> MPI2_RAID_SCSI_IO_REQUEST));
> 185 }
>
> You might want to confirm that locking always prevents that.
>
> OTOH, without an actual backtrace, I too might be reluctant to pursue
this kind
> of speculation.
>
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-13  7:43                   ` Sumit Saxena
@ 2016-05-13  9:25                     ` Finn Thain
  0 siblings, 0 replies; 20+ messages in thread
From: Finn Thain @ 2016-05-13  9:25 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: Dan Carpenter, Petros Koutoupis, kashyap.desai, sumit.saxena,
	uday.lingala, megaraidlinux.pdl, linux-scsi


On Fri, 13 May 2016, Sumit Saxena wrote:

> 
> For IOs returned by above error are not actually fired to firmware so 
> there will be no interrupt handler called for the same.

You don't need both possibilities to hold. Either one would do it!

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
  2016-05-12  6:35             ` Dan Carpenter
  2016-05-12 12:35               ` Sumit Saxena
@ 2016-05-13 23:34               ` Petros Koutoupis
  1 sibling, 0 replies; 20+ messages in thread
From: Petros Koutoupis @ 2016-05-13 23:34 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sumit Saxena, Finn Thain, kashyap.desai, sumit.saxena,
	uday.lingala, megaraidlinux.pdl, linux-scsi

On Thu, 2016-05-12 at 09:35 +0300, Dan Carpenter wrote:
> On Wed, May 11, 2016 at 08:49:51PM -0500, Petros Koutoupis wrote:
> > Sumit,
> > 
> > I will resubmit the patch with all the recommendations. Thank you. In case
> > you are interested, I have a crash file showcasing the error. I can always
> > provide this outside of this mailing thread.
> > 
> 
> Please send it to the thread.
> 
> To be honest, I totally can't understand this thread.  Sumit says it is
> impossible and you are saying that you have seen it happen in real life.
> Are you using out of tree code or something?  What are you doing that is
> unexpected?
> 
> I don't see the point of adding a WARN_ON().  NULL derefs normally
> generate a pretty clear stack trace already (unless they are caused by
> memory corruption).  Why are we not just fixing the bugs instead of
> warning and then crashing.
> 
> Also when I'm doing static analysis people always tell me that "that bug
> is impossible, trust me." and instead of trusting people I really wish
> they would just show me the relevant code that prevents it from
> happening.
> 
> regards,
> dan carpenter
> 
All,

You will find the tarball containing the crash and kernel images here:
https://drive.google.com/open?id=0B771xrWtHcpWWFI3Yl93WFBfT28

Note that it is a 7GB file.

--
Petros


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-05-13 23:34 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-08  0:40 [PATCH] megaraid: add scsi_cmnd NULL check before use Petros Koutoupis
2016-05-08  3:32 ` Finn Thain
2016-05-08 12:08   ` Petros Koutoupis
2016-05-08 12:22     ` Finn Thain
2016-05-08 16:34       ` Petros Koutoupis
2016-05-09  1:35         ` Julian Calaby
2016-05-09  2:59           ` Petros Koutoupis
2016-05-09  8:05   ` Dan Carpenter
2016-05-09  9:43     ` Dan Carpenter
2016-05-09  9:48     ` Sumit Saxena
2016-05-09 19:29       ` Dan Carpenter
2016-05-09 21:28       ` Petros Koutoupis
2016-05-11  9:41         ` Sumit Saxena
2016-05-12  1:49           ` Petros Koutoupis
2016-05-12  6:35             ` Dan Carpenter
2016-05-12 12:35               ` Sumit Saxena
2016-05-13  7:43                 ` Finn Thain
2016-05-13  7:43                   ` Sumit Saxena
2016-05-13  9:25                     ` Finn Thain
2016-05-13 23:34               ` Petros Koutoupis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).