From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=39358 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGRHy-0004NT-2S for qemu-devel@nongnu.org; Mon, 24 May 2010 02:37:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OGRHw-0004Xy-MN for qemu-devel@nongnu.org; Mon, 24 May 2010 02:37:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:25825) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGRHw-0004Xe-EK for qemu-devel@nongnu.org; Mon, 24 May 2010 02:37:24 -0400 Message-ID: <4BFA1E91.8060706@redhat.com> Date: Mon, 24 May 2010 09:37:05 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself References: <20100505205814.GA7090@redhat.com> <4BF39C12.7090407@redhat.com> <201005201431.51142.rusty@rustcorp.com.au> <201005201438.17010.rusty@rustcorp.com.au> <20100523153134.GA14646@redhat.com> <4BF94CAD.5010504@redhat.com> <20100523155132.GA14733@redhat.com> <4BF951BE.1010402@redhat.com> <20100523163039.GC14733@redhat.com> In-Reply-To: <20100523163039.GC14733@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, Rusty Russell , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org On 05/23/2010 07:30 PM, Michael S. Tsirkin wrote: > > >>> Maybe we should use atomics on index then? >>> >>> >> This should only be helpful if you access the cacheline several times in >> a row. That's not the case in virtio (or here). >> > So why does it help? > We actually do access the cacheline several times in a row here (but not in virtio?): > case SHARE: > while (count< MAX_BOUNCES) { > /* Spin waiting for other side to change it. */ > while (counter->cacheline1 != count); > Broadcast a read request. > count++; > counter->cacheline1 = count; > Broadcast an invalidate request. > count++; > } > break; > > case LOCKSHARE: > while (count< MAX_BOUNCES) { > /* Spin waiting for other side to change it. */ > while (__sync_val_compare_and_swap(&counter->cacheline1, count, count+1) > != count); > Broadcast a 'read for ownership' request. > count += 2; > } > break; > So RMW should certainly by faster using single-instruction RMW operations (or using prefetchw). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.