From: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
To: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>,
linux-scsi <linux-scsi@vger.kernel.org>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Frederic Barrat <fbarrat@linux.vnet.ibm.com>,
"Manoj N. Kumar" <manoj@linux.vnet.ibm.com>,
Ian Munsie <imunsie@au1.ibm.com>,
Andrew Donnellan <andrew.donnellan@au1.ibm.com>,
Brian King <brking@linux.vnet.ibm.com>,
linuxppc-dev@lists.ozlabs.org,
Christophe Lombard <clombard@linux.vnet.ibm.com>
Subject: Re: [PATCH 04/14] cxlflash: Avoid command room violation
Date: Thu, 17 Nov 2016 16:30:55 -0600 [thread overview]
Message-ID: <4ad82a68-9b4a-cc3b-e761-2e16f61617b5@linux.vnet.ibm.com> (raw)
In-Reply-To: <1A9CB955-077B-4B81-840D-9E268DE9B914@linux.vnet.ibm.com>
Thanks for catching this Matt. Looking into this. Will send out a V2.
On 11/17/2016 1:36 PM, Matthew R. Ochs wrote:
> Hi Uma,
>
> I do see a potential hang issue with this patch. See my comments below.
>
>
> -matt
>
>> On Nov 15, 2016, at 5:14 PM, Uma Krishnan <ukrishn@linux.vnet.ibm.com> wrote:
>>
>> During test, a command room violation interrupt is occasionally seen
>> for the master context when the CXL flash devices are stressed.
>>
>> After studying the code, there could be gaps in the way command room
>> value is being cached in cxlflash. When the cached command room is zero
>> the thread attempting to send becomes burdened with updating the cached
>> value with the actual value from the AFU. Today, this is handled with
>> an atomic set operation of the raw value read. Following the atomic
>> update, the thread proceeds to send.
>>
>> This behavior is incorrect on two counts:
>>
>> - The update fails to take into account the current thread and its
>> consumption of one of the hardware commands.
>>
>> - The update does not take into account other threads also atomically
>> updating. Per design, a worker thread updates the cached value when
>> a send thread times out. By not performing an atomic compare/exchange,
>> the cached value can be incorrectly clobbered.
>>
>> To correct these issues, the runtime updates of the cached command room
>> are updated to use atomic64_cmpxchg() and the send routine is updated to
>> take into account the current thread consuming a hardware command.
>>
>> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
>> ---
>> drivers/scsi/cxlflash/main.c | 16 ++++++++++------
>> 1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
>> index 6d33d8c..1a32e8b 100644
>> --- a/drivers/scsi/cxlflash/main.c
>> +++ b/drivers/scsi/cxlflash/main.c
>> @@ -322,9 +322,10 @@ static int send_cmd(struct afu *afu, struct afu_cmd *cmd)
>> if (!newval) {
>
> When this path is invoked, the current thread is consuming the last entry
> available entry before the room must be read again. While the change
> below is fine for circumstances where the hardware queue has room for
> more than one command, consider a scenario where the queue has room
> for only 1 command (the command that you just consumed via the atomic
> but are not really consuming with a MMIO due to the revised goto).
>
> In such a scenario this code would loop endlessly, bypassing the timeout
> logic completely, until the read room reflected a value greater than 1.
>
>> do {
>> room = readq_be(&afu->host_map->cmd_room);
>> - atomic64_set(&afu->room, room);
>> - if (room)
>> - goto write_ioarrin;
>> + if (room) {
>> + atomic64_cmpxchg(&afu->room, 0, room);
>> + goto retry;
>> + }
>
> If you instead fully consume the entry (goto write_ioarrin - similar as it was
> before) and take into account the consumption when you update the cached
> value (i.e.: cmpxchg(..., 0, room - 1) the scenario described above will not occur.
>
>
next prev parent reply other threads:[~2016-11-17 22:30 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-15 23:12 [PATCH 00/14] cxlflash: Fixes, enhancements, cleanup and staging Uma Krishnan
2016-11-15 23:13 ` [PATCH 01/14] cxlflash: Set sg_tablesize to 1 instead of SG_NONE Uma Krishnan
2016-11-17 19:20 ` Matthew R. Ochs
2016-11-15 23:14 ` [PATCH 02/14] cxlflash: Fix crash in cxlflash_restore_luntable() Uma Krishnan
2016-11-17 19:20 ` Matthew R. Ochs
2016-11-15 23:14 ` [PATCH 03/14] cxlflash: Improve context_reset() logic Uma Krishnan
2016-11-17 19:21 ` Matthew R. Ochs
2016-11-15 23:14 ` [PATCH 04/14] cxlflash: Avoid command room violation Uma Krishnan
2016-11-17 19:36 ` Matthew R. Ochs
2016-11-17 22:30 ` Uma Krishnan [this message]
2016-11-15 23:14 ` [PATCH 05/14] cxlflash: Remove unused buffer from AFU command Uma Krishnan
2016-11-15 23:14 ` [PATCH 06/14] cxlflash: Allocate memory instead of using command pool for AFU sync Uma Krishnan
2016-11-15 23:15 ` [PATCH 07/14] cxlflash: Use cmd_size for private commands Uma Krishnan
2016-11-15 23:15 ` [PATCH 08/14] cxlflash: Remove private command pool Uma Krishnan
2016-11-15 23:15 ` [PATCH 09/14] cxlflash: Wait for active AFU commands to timeout upon tear down Uma Krishnan
2016-11-15 23:15 ` [PATCH 10/14] cxlflash: Remove AFU command lock Uma Krishnan
2016-11-15 23:15 ` [PATCH 11/14] cxlflash: Cleanup send_tmf() Uma Krishnan
2016-11-15 23:15 ` [PATCH 12/14] cxlflash: Cleanup queuecommand() Uma Krishnan
2016-11-15 23:16 ` [PATCH 13/14] cxlflash: Migrate IOARRIN specific routines to function pointers Uma Krishnan
2016-11-15 23:16 ` [PATCH 14/14] cxlflash: Migrate scsi command pointer to AFU command Uma Krishnan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ad82a68-9b4a-cc3b-e761-2e16f61617b5@linux.vnet.ibm.com \
--to=ukrishn@linux.vnet.ibm.com \
--cc=andrew.donnellan@au1.ibm.com \
--cc=brking@linux.vnet.ibm.com \
--cc=clombard@linux.vnet.ibm.com \
--cc=fbarrat@linux.vnet.ibm.com \
--cc=imunsie@au1.ibm.com \
--cc=jejb@linux.vnet.ibm.com \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=manoj@linux.vnet.ibm.com \
--cc=martin.petersen@oracle.com \
--cc=mrochs@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).