From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Bolkhovitin Subject: Re: [PATCH] IB/srp: use multiple CPU cores more effectively Date: Mon, 02 Aug 2010 17:08:06 +0400 Message-ID: <4C56C336.4040009@vlnb.net> References: <201008021015.40472.bvanassche@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <201008021015.40472.bvanassche-HInyCGIudOg@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roland Dreier , David Dillow , Ralph Campbell List-Id: linux-rdma@vger.kernel.org Bart Van Assche, on 08/02/2010 12:15 PM wrote: > SRP I/O with small block sizes causes a high CPU load. Processing IB > completions on the context of a kernel thread instead of in interrupt context > allows to process up to 25% more I/O operations per second. This patch does > add a kernel parameter 'thread' that allows to specify whether to process IB > completions in interrupt context or in kernel thread context. Also, the IB > receive notification processing loop is rewritten as proposed earlier by Ralph > Campbell (see also https://patchwork.kernel.org/patch/89426/). As the > measurement results below show, rewriting the IB receive notification > processing loop did not have a measurable impact on performance. Processing > IB receive notifications in thread context however does have a measurable > impact: workloads with I/O depth one are processed at most 10% slower and > workloads with larger I/O depths are processed up to 25% faster. > > block size number of IOPS IOPS IOPS > in bytes threads without with with > ($bs) ($numjobs) this patch thread=n thread=y > 512 1 25,400 25,400 23,100 > 512 128 122,000 122,000 153,000 > 4096 1 25,000 25,000 22,700 > 4096 128 122,000 121,000 157,000 > 65536 1 14,300 14,400 13,600 > 65536 4 36,700 36,700 36,600 > 524288 1 3,470 3,430 3,420 > 524288 4 5,020 5,020 4,990 > > performance test used to gather the above results: > fio --bs=${bs} --ioengine=sg --buffered=0 --size=128M --rw=read \ > --thread --numjobs=${numjobs} --loops=100 --group_reporting \ > --gtod_reduce=1 --name=${dev} --filename=${dev} > other ib_srp kernel module parameters: srp_sg_tablesize=128 How about results of "dd Xflags=direct" in different modes to find out the lowest latency the driver can process 512 and 4K packets? Sorry, I don't trust fio, when it comes to precise latency measurements. Vlad -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html