Re: [PATCH] ibmvscsi: fix race potential race after loss of transport

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Tyrel Datwyler <tyreld@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	james.bottomley@hansenpartnership.com
Cc: brking@linux.ibm.com, linuxppc-dev@lists.ozlabs.org,
	linux-scsi@vger.kernel.org, martin.petersen@oracle.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ibmvscsi: fix race potential race after loss of transport
Date: Wed, 28 Oct 2020 18:28:20 -0700	[thread overview]
Message-ID: <d527dffd-1af2-7da0-d1f1-1f192a537aed@linux.ibm.com> (raw)
In-Reply-To: <87o8knvsb1.fsf@mpe.ellerman.id.au>

On 10/27/20 10:21 PM, Michael Ellerman wrote:
> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>> After a loss of tranport due to an adatper migration or crash/disconnect from
>> the host partner there is a tiny window where we can race adjusting the
>> request_limit of the adapter. The request limit is atomically inc/dec to track
>> the number of inflight requests against the allowed limit of our VIOS partner.
>> After a transport loss we set the request_limit to zero to reflect this state.
>> However, there is a window where the adapter may attempt to queue a command
>> because the transport loss event hasn't been fully processed yet and
>> request_limit is still greater than zero. The hypercall to send the event will
>> fail and the error path will increment the request_limit as a result. If the
>> adapter processes the transport event prior to this increment the request_limit
>> becomes out of sync with the adapter state and can result in scsi commands being
>> submitted on the now reset connection prior to an SRP Login resulting in a
>> protocol violation.
>>
>> Fix this race by protecting request_limit with the host lock when changing the
>> value via atomic_set() to indicate no transport.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>> ---
>>  drivers/scsi/ibmvscsi/ibmvscsi.c | 36 +++++++++++++++++++++++---------
>>  1 file changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
>> index b1f3017b6547..188ed75417a5 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
>> @@ -806,6 +806,22 @@ static void purge_requests(struct ibmvscsi_host_data *hostdata, int error_code)
>>  	spin_unlock_irqrestore(hostdata->host->host_lock, flags);
>>  }
>>  
>> +/**
>> + * ibmvscsi_set_request_limit - Set the adapter request_limit in response to
>> + * an adapter failure, reset, or SRP Login. Done under host lock to prevent
>> + * race with scsi command submission.
>> + * @hostdata:	adapter to adjust
>> + * @limit:	new request limit
>> + */
>> +static void ibmvscsi_set_request_limit(struct ibmvscsi_host_data *hostdata, int limit)
>> +{
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(hostdata->host->host_lock, flags);
>> +	atomic_set(&hostdata->request_limit, limit);
>> +	spin_unlock_irqrestore(hostdata->host->host_lock, flags);
>> +}
>> +
>>  /**
>>   * ibmvscsi_reset_host - Reset the connection to the server
>>   * @hostdata:	struct ibmvscsi_host_data to reset
> ...
>> @@ -2137,12 +2153,12 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata)
>>  	}
>>  
>>  	hostdata->action = IBMVSCSI_HOST_ACTION_NONE;
>> +	spin_unlock_irqrestore(hostdata->host->host_lock, flags);
> 
> You drop the lock ...
> 
>>  	if (rc) {
>> -		atomic_set(&hostdata->request_limit, -1);
>> +		ibmvscsi_set_request_limit(hostdata, -1);
> 
> .. then retake it, then drop it again in ibmvscsi_set_request_limit().
> 
> Which introduces the possibility that something else gets the lock
> before you can set the limit to -1.
> 
> I'm not sure that's a bug, but it's not obviously correct either?

Yeah, I'd already moved the request_limit update into its own function before I
got to this case which made me a bit uneasy when I realized I had to drop the
lock because my new function takes the lock. However, we only need to protect
ourselves from from racing with queuecommand() which is locked for its entire
call. Further, if we've gotten here it means we were either resetting or
re-enabling the adapter which would have already set request_limit to zero. At
this point the transport was already gone and we've further failed to reset it.
Also, we've blocked any new scsi requests at this point.

-Tyrel

> 
> cheers
> 
>>  		dev_err(hostdata->dev, "error after %s\n", action);
>>  	}
>> -	spin_unlock_irqrestore(hostdata->host->host_lock, flags);
>>  
>>  	scsi_unblock_requests(hostdata->host);
>>  }

     prev parent reply	other threads:[~2020-10-29  1:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-25  0:13 [PATCH] ibmvscsi: fix race potential race after loss of transport Tyrel Datwyler
2020-10-26 22:17 ` Martin K. Petersen
2020-10-28  5:21 ` Michael Ellerman
2020-10-29  1:28   ` Tyrel Datwyler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d527dffd-1af2-7da0-d1f1-1f192a537aed@linux.ibm.com \
    --to=tyreld@linux.ibm.com \
    --cc=brking@linux.ibm.com \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).