From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: [PATCH 3/3] tcm ibmvscsis driver Date: Tue, 15 Feb 2011 20:20:53 +0100 Message-ID: References: <20110214181129X.fujita.tomonori@lab.ntt.co.jp> <20110215124208B.fujita.tomonori@lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Return-path: Received: from mail-gx0-f174.google.com ([209.85.161.174]:52120 "EHLO mail-gx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751792Ab1BOTVO (ORCPT ); Tue, 15 Feb 2011 14:21:14 -0500 Received: by gxk9 with SMTP id 9so255734gxk.19 for ; Tue, 15 Feb 2011 11:21:14 -0800 (PST) In-Reply-To: <20110215124208B.fujita.tomonori@lab.ntt.co.jp> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: FUJITA Tomonori Cc: linux-scsi@vger.kernel.org, brking@linux.vnet.ibm.com On Tue, Feb 15, 2011 at 4:42 AM, FUJITA Tomonori wrote: > On Mon, 14 Feb 2011 12:50:40 +0100 > Bart Van Assche wrote: > >> - send_rsp() sends back an SRP response with req_lim_delta = 1 before >> srp_iu_put() is invoked. I think this is a race condition: it can >> happen that the SRP initiator has received an SRP_RSP and sends a new >> SRP_CMD before srp_iu_put() was invoked. On a heavily loaded system >> that can trigger a queue overflow in the target. > > Yeah, we had better to handle such case better. > > But I don't think that we hit a critical event such as crash, memory > corruption, etc, with the current code. > > In such case, srp_iu_get return NULL, so the target ignores the > request, then eventually the initiator recovers. Sorry but I do not agree that hitting this race is harmless. If this race gets triggered the credit associated with the lost SRP information unit will never be returned to the initiator. If this happens frequently enough (INITIAL_SRP_LIMIT times), the initiator will run out of credits and will lock up forever. > Probably, we should set the queue size to INITIAL_SRP_LIMIT + 1. With the current implementation of the SCSI storage target core that's probably sufficient. But if that core is ever modified such that multiple SCSI commands can be processed concurrently, that limit wouldn't be sufficient anymore because then the described race condition could be triggered in multiple threads simultaneously - at least in theory. An alternative solution is to modify send_iu() such that srp_iu_put() is invoked after all relevant data has been copied via h_copy_rdma() and to the CRQ and before h_send_crq() is invoked. That approach has e.g. been implemented in this patch: http://www.spinics.net/lists/linux-scsi/msg49105.html. Bart.