Re: [PATCH 04/14] cxlflash: Avoid command room violation

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
To: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Frederic Barrat <fbarrat@linux.vnet.ibm.com>,
	"Manoj N. Kumar" <manoj@linux.vnet.ibm.com>,
	Ian Munsie <imunsie@au1.ibm.com>,
	Andrew Donnellan <andrew.donnellan@au1.ibm.com>,
	Brian King <brking@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org,
	Christophe Lombard <clombard@linux.vnet.ibm.com>
Subject: Re: [PATCH 04/14] cxlflash: Avoid command room violation
Date: Thu, 17 Nov 2016 16:30:55 -0600	[thread overview]
Message-ID: <4ad82a68-9b4a-cc3b-e761-2e16f61617b5@linux.vnet.ibm.com> (raw)
In-Reply-To: <1A9CB955-077B-4B81-840D-9E268DE9B914@linux.vnet.ibm.com>

Thanks for catching this Matt. Looking into this. Will send out a V2.

On 11/17/2016 1:36 PM, Matthew R. Ochs wrote:
> Hi Uma,
>
> I do see a potential hang issue with this patch. See my comments below.
>
>
> -matt
>
>> On Nov 15, 2016, at 5:14 PM, Uma Krishnan <ukrishn@linux.vnet.ibm.com> wrote:
>>
>> During test, a command room violation interrupt is occasionally seen
>> for the master context when the CXL flash devices are stressed.
>>
>> After studying the code, there could be gaps in the way command room
>> value is being cached in cxlflash. When the cached command room is zero
>> the thread attempting to send becomes burdened with updating the cached
>> value with the actual value from the AFU. Today, this is handled with
>> an atomic set operation of the raw value read. Following the atomic
>> update, the thread proceeds to send.
>>
>> This behavior is incorrect on two counts:
>>
>>   - The update fails to take into account the current thread and its
>>     consumption of one of the hardware commands.
>>
>>   - The update does not take into account other threads also atomically
>>     updating. Per design, a worker thread updates the cached value when
>>     a send thread times out. By not performing an atomic compare/exchange,
>>     the cached value can be incorrectly clobbered.
>>
>> To correct these issues, the runtime updates of the cached command room
>> are updated to use atomic64_cmpxchg() and the send routine is updated to
>> take into account the current thread consuming a hardware command.
>>
>> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
>> ---
>> drivers/scsi/cxlflash/main.c | 16 ++++++++++------
>> 1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
>> index 6d33d8c..1a32e8b 100644
>> --- a/drivers/scsi/cxlflash/main.c
>> +++ b/drivers/scsi/cxlflash/main.c
>> @@ -322,9 +322,10 @@ static int send_cmd(struct afu *afu, struct afu_cmd *cmd)
>> 	if (!newval) {
>
> When this path is invoked, the current thread is consuming the last entry
> available entry before the room must be read again. While the change
> below is fine for circumstances where the hardware queue has room for
> more than one command, consider a scenario where the queue has room
> for only 1 command (the command that you just consumed via the atomic
> but are not really consuming with a MMIO due to the revised goto).
>
> In such a scenario this code would loop endlessly, bypassing the timeout
> logic completely, until the read room reflected a value greater than 1.
>
>> 		do {
>> 			room = readq_be(&afu->host_map->cmd_room);
>> -			atomic64_set(&afu->room, room);
>> -			if (room)
>> -				goto write_ioarrin;
>> +			if (room) {
>> +				atomic64_cmpxchg(&afu->room, 0, room);
>> +				goto retry;
>> +			}
>
> If you instead fully consume the entry (goto write_ioarrin - similar as it was
> before) and take into account the consumption when you update the cached
> value (i.e.: cmpxchg(..., 0, room - 1) the scenario described above will not occur.
>
>

next prev parent reply	other threads:[~2016-11-17 22:30 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-15 23:12 [PATCH 00/14] cxlflash: Fixes, enhancements, cleanup and staging Uma Krishnan
2016-11-15 23:13 ` [PATCH 01/14] cxlflash: Set sg_tablesize to 1 instead of SG_NONE Uma Krishnan
2016-11-17 19:20   ` Matthew R. Ochs
2016-11-15 23:14 ` [PATCH 02/14] cxlflash: Fix crash in cxlflash_restore_luntable() Uma Krishnan
2016-11-17 19:20   ` Matthew R. Ochs
2016-11-15 23:14 ` [PATCH 03/14] cxlflash: Improve context_reset() logic Uma Krishnan
2016-11-17 19:21   ` Matthew R. Ochs
2016-11-15 23:14 ` [PATCH 04/14] cxlflash: Avoid command room violation Uma Krishnan
2016-11-17 19:36   ` Matthew R. Ochs
2016-11-17 22:30     ` Uma Krishnan [this message]
2016-11-15 23:14 ` [PATCH 05/14] cxlflash: Remove unused buffer from AFU command Uma Krishnan
2016-11-15 23:14 ` [PATCH 06/14] cxlflash: Allocate memory instead of using command pool for AFU sync Uma Krishnan
2016-11-15 23:15 ` [PATCH 07/14] cxlflash: Use cmd_size for private commands Uma Krishnan
2016-11-15 23:15 ` [PATCH 08/14] cxlflash: Remove private command pool Uma Krishnan
2016-11-15 23:15 ` [PATCH 09/14] cxlflash: Wait for active AFU commands to timeout upon tear down Uma Krishnan
2016-11-15 23:15 ` [PATCH 10/14] cxlflash: Remove AFU command lock Uma Krishnan
2016-11-15 23:15 ` [PATCH 11/14] cxlflash: Cleanup send_tmf() Uma Krishnan
2016-11-15 23:15 ` [PATCH 12/14] cxlflash: Cleanup queuecommand() Uma Krishnan
2016-11-15 23:16 ` [PATCH 13/14] cxlflash: Migrate IOARRIN specific routines to function pointers Uma Krishnan
2016-11-15 23:16 ` [PATCH 14/14] cxlflash: Migrate scsi command pointer to AFU command Uma Krishnan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ad82a68-9b4a-cc3b-e761-2e16f61617b5@linux.vnet.ibm.com \
    --to=ukrishn@linux.vnet.ibm.com \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=clombard@linux.vnet.ibm.com \
    --cc=fbarrat@linux.vnet.ibm.com \
    --cc=imunsie@au1.ibm.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=manoj@linux.vnet.ibm.com \
    --cc=martin.petersen@oracle.com \
    --cc=mrochs@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).