From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bz8nQ-0001k4-Ak for qemu-devel@nongnu.org; Tue, 25 Oct 2016 16:58:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bz8nN-00077c-9w for qemu-devel@nongnu.org; Tue, 25 Oct 2016 16:58:08 -0400 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:40709) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bz8nN-00077V-35 for qemu-devel@nongnu.org; Tue, 25 Oct 2016 16:58:05 -0400 Received: by mail-wm0-x242.google.com with SMTP id b80so2060741wme.7 for ; Tue, 25 Oct 2016 13:58:04 -0700 (PDT) Sender: Paolo Bonzini References: <20161025153507.27110-1-bobby.prani@gmail.com> <5b91345d-fea6-8d98-69a9-eda1e256c3e1@redhat.com> <54c9a235-4ab4-1731-b076-116ecde91442@redhat.com> <87shrkckfv.fsf@gmail.com> <5190ece5-7167-dbe1-5b5a-38247cba5be3@redhat.com> <20161025204515.GA22860@flamenco> From: Paolo Bonzini Message-ID: <7012eb7a-929a-519b-cf0d-48c62d594589@redhat.com> Date: Tue, 25 Oct 2016 22:58:00 +0200 MIME-Version: 1.0 In-Reply-To: <20161025204515.GA22860@flamenco> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" , Pranith Kumar Cc: Richard Henderson , =?UTF-8?Q?Alex_Benn=c3=a9e?= , "open list:All patches CC here" , Markus Armbruster On 25/10/2016 22:45, Emilio G. Cota wrote: > On Tue, Oct 25, 2016 at 16:35:48 -0400, Pranith Kumar wrote: >> On Tue, Oct 25, 2016 at 4:02 PM, Paolo Bonzini wrote: >>> >>> >>>> I've written a patch (see below) to take the per-bucket sequence locks. >>> >>> What's the performance like? >>> >> >> Applying only this patch, the perf numbers are similar to the 128 >> cache line alignment you suggested. > > That makes sense. Having a single seqlock per bucket is simple and fast; > note that bucket chains should be very short (we use good hashing and > automatic resize for this purpose). But why do we get such worse performance in the 100% reader case? (And even more puzzling, why does Pranith's original patch improve performance instead of causing more cache misses?) Thanks, Paolo