From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1162106AbcFGUNq (ORCPT ); Tue, 7 Jun 2016 16:13:46 -0400 Received: from one.firstfloor.org ([193.170.194.197]:40475 "EHLO one.firstfloor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1162054AbcFGUNo (ORCPT ); Tue, 7 Jun 2016 16:13:44 -0400 Date: Tue, 7 Jun 2016 13:13:40 -0700 From: Andi Kleen To: Waiman Long Cc: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Boqun Feng , Scott J Norton , Douglas Hatch Subject: Re: [RESEND PATCH 1/5] lib/dlock-list: Distributed and lock-protected lists Message-ID: <20160607201340.GL13997@two.firstfloor.org> References: <1465328155-56754-1-git-send-email-Waiman.Long@hpe.com> <1465328155-56754-2-git-send-email-Waiman.Long@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1465328155-56754-2-git-send-email-Waiman.Long@hpe.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 07, 2016 at 03:35:51PM -0400, Waiman Long wrote: > Linked list is used everywhere in the Linux kernel. However, if many > threads are trying to add or delete entries into the same linked list, > it can create a performance bottleneck. > > This patch introduces a new list APIs that provide a set of distributed > lists (one per CPU), each of which is protected by its own spinlock. One thing I don't like is that it is per CPU. One per CPU is almost certainly overkill and not needed for true scalability, especially on systems using SMT. Also it makes the case where everything has to be walked more and more expensive, because all these locks have to be taken. Even when not contended this will add up. It would be better to do this per every Nth CPU. Now I don't have a clear answer what the best N is, but I'm pretty sure it's > 1. For example at least on SMT systems only per core instead of per thread. Likely even more coarse grained, although per socket may be not good enough. -Andi