From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion Date: Fri, 09 Jan 2015 19:28:13 +0100 Message-ID: <54B01DBD.5020707@suse.de> References: <54AD5DDD.2090808@dev.mellanox.co.il> <54AD6563.4040603@suse.de> <54ADA777.6090801@cs.wisc.edu> <54AE36CE.8020509@acm.org> <1420755361.2842.16.camel@haakon3.risingtidesystems.com> <1420756142.11310.9.camel@HansenPartnership.com> <1420757822.2842.39.camel@haakon3.risingtidesystems.com> <1420759360.11310.13.camel@HansenPartnership.com> <1420779808.21830.21.camel@haakon3.risingtidesystems.com> <38CE4ECA-D155-4BF9-9D6D-E1A01ADA05E4@cs.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <38CE4ECA-D155-4BF9-9D6D-E1A01ADA05E4@cs.wisc.edu> Sender: target-devel-owner@vger.kernel.org To: Michael Christie , "Nicholas A. Bellinger" Cc: James Bottomley , lsf-pc@lists.linux-foundation.org, Bart Van Assche , linux-scsi , Sagi Grimberg , target-devel , open-iscsi@googlegroups.com List-Id: linux-scsi@vger.kernel.org On 01/09/2015 07:00 PM, Michael Christie wrote: >=20 > On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger wrote: >=20 >> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote: >>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote: >>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote: >>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote: >> >> >> >>>> The point is that a simple session wide counter for command sequen= ce >>>> number assignment is significantly less overhead than all of the >>>> overhead associated with running a full multipath stack atop multi= ple >>>> sessions. >>> >>> I don't see how that's relevant to issue speed, which was the measu= re we >>> were using: The layers above are just a hopper. As long as they're >>> loaded, the MQ lower layer can issue at full speed. So as long as = the >>> multipath hopper is efficient enough to keep the queues loaded ther= e's >>> no speed degradation. >>> >>> The problem with a sequence point inside the MQ issue layer is that= it >>> can cause a stall that reduces the issue speed. so the counter sequ= ence >>> point causes a degraded issue speed over the multipath hopper appro= ach >>> above even if the multipath approach has a higher CPU overhead. >>> >>> Now, if the system is close to 100% cpu already, *then* the multipa= th >>> overhead will try to take CPU power we don't have and cause a stall= , but >>> it's only in the flat out CPU case. >>> >>>> Not to mention that our iSCSI/iSER initiator is already taking a s= ession >>>> wide lock when sending outgoing PDUs, so adding a session wide cou= nter >>>> isn't adding any additional synchronization overhead vs. what's al= ready >>>> in place. >>> >>> I'll leave it up to the iSER people to decide whether they're redoi= ng >>> this as part of the MQ work. >>> >> >> Session wide command sequence number synchronization isn't something= to >> be removed as part of the MQ work. It's a iSCSI/iSER protocol >> requirement. >> >> That is, the expected + maximum sequence numbers are returned as par= t of >> every response PDU, which the initiator uses to determine when the >> command sequence number window is open so new non-immediate commands= may >> be sent to the target. >> >> So, given some manner of session wide synchronization is required >> between different contexts for the existing single connection case t= o >> update the command sequence number and check when the window opens, = it's >> a fallacy to claim MC/S adds some type of new initiator specific >> synchronization overhead vs. single connection code. >=20 > I think you are assuming we are leaving the iscsi code as it is today= =2E >=20 > For the non-MCS mq session per CPU design, we would be allocating and > binding the session and its resources to specific CPUs. They would on= ly > be accessed by the threads on that one CPU, so we get our > serialization/synchronization from that. That is why we are saying we > do not need something like atomic_t/spin_locks for the sequence numbe= r > handling for this type of implementation. >=20 Wouldn't that need to be coordinated with the networking layer? Doesn't it do the same thing, matching TX/RX queues to CPUs? If so, wouldn't we decrease bandwidth by restricting things to one CPU? Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)