From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=33277 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGDea-00048Y-7u for qemu-devel@nongnu.org; Sun, 23 May 2010 12:03:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OGDeY-0003FL-1C for qemu-devel@nongnu.org; Sun, 23 May 2010 12:03:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:7856) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGDeX-0003FA-NI for qemu-devel@nongnu.org; Sun, 23 May 2010 12:03:50 -0400 Message-ID: <4BF951BE.1010402@redhat.com> Date: Sun, 23 May 2010 19:03:10 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself References: <20100505205814.GA7090@redhat.com> <4BF39C12.7090407@redhat.com> <201005201431.51142.rusty@rustcorp.com.au> <201005201438.17010.rusty@rustcorp.com.au> <20100523153134.GA14646@redhat.com> <4BF94CAD.5010504@redhat.com> <20100523155132.GA14733@redhat.com> In-Reply-To: <20100523155132.GA14733@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, Rusty Russell , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org On 05/23/2010 06:51 PM, Michael S. Tsirkin wrote: >> >>> So locked version seems to be faster than unlocked, >>> and share/unshare not to matter? >>> >>> >> May be due to the processor using the LOCK operation as a hint to >> reserve the cacheline for a bit. >> > Maybe we should use atomics on index then? > This should only be helpful if you access the cacheline several times in a row. That's not the case in virtio (or here). I think the problem is that LOCKSHARE and SHARE are not symmetric, so they can't be directly compared. > OK, after adding mb in code patch will be sent separately, > the test works for my workstation. locked is still fastest, > unshared sometimes shows wins and sometimes loses over shared. > > [root@qus19 ~]# ./cachebounce share 0 1 > CPU 0: share cacheline: 6638521 usec > CPU 1: share cacheline: 6638478 usec > 66 ns? nice. > [root@qus19 ~]# ./cachebounce share 0 2 > CPU 0: share cacheline: 14529198 usec > CPU 2: share cacheline: 14529156 usec > 140 ns, not too bad. I hope I'm not misinterpreting the results. -- error compiling committee.c: too many arguments to function