From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= Subject: [PATCH] index-pack: avoid false sharing on multicore systems Date: Fri, 27 Apr 2012 23:52:41 +0700 Message-ID: <1335545561-933-1-git-send-email-pclouds@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= To: git@vger.kernel.org X-From: git-owner@vger.kernel.org Fri Apr 27 18:56:15 2012 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SNoSq-0004K0-Gt for gcvg-git-2@plane.gmane.org; Fri, 27 Apr 2012 18:56:12 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760779Ab2D0Q4H convert rfc822-to-quoted-printable (ORCPT ); Fri, 27 Apr 2012 12:56:07 -0400 Received: from mail-ob0-f174.google.com ([209.85.214.174]:56161 "EHLO mail-ob0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758382Ab2D0Q4E (ORCPT ); Fri, 27 Apr 2012 12:56:04 -0400 Received: by obbta14 with SMTP id ta14so1277872obb.19 for ; Fri, 27 Apr 2012 09:56:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:mime-version :content-type:content-transfer-encoding; bh=bQWnMQgzyfKxnH+558Cydm/qLLB1yEaqUYRvRtBqb9E=; b=w1VG+G8fGl1hjIEmkJxsuK7gczxurQH7P1dQWLEEUfyPhFaNtup//oMAVjp3lAhvBm 0Hssp4qBwmPLhDmuXYM44raf1BSf92GRn3KZlj2KnJX8sTJ99enokcDZllfiPFtl+Gw8 VXuiqNsroJYRqQqLEkQv9lWyj/7MyZRE01OA+SmVWuojmnV6fp5kl2tEKC3JCdoD7cM0 BQiDfwImkQ2n8oBeJcGqUW26Mi+9rAIJdWbH+5w3kPFO2T9sZmg06miGPZ74R9vqs6yY 0mNl02aZ85lbFGnShF8yrokMnN+odyUNcLdtooCKfXXbR9OTJK3kjTNwUNnHs8/cg3cd 7rbA== Received: by 10.182.179.68 with SMTP id de4mr15577270obc.13.1335545764348; Fri, 27 Apr 2012 09:56:04 -0700 (PDT) Received: from pclouds@gmail.com ([115.74.58.219]) by mx.google.com with ESMTPS id xb7sm7719833obb.10.2012.04.27.09.56.00 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 27 Apr 2012 09:56:03 -0700 (PDT) Received: by pclouds@gmail.com (sSMTP sendmail emulation); Fri, 27 Apr 2012 23:52:43 +0700 X-Mailer: git-send-email 1.7.8.36.g69ee2 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: thread_data[] is frequently accessed by all threads and it's packed tight. If two threads running on two different processors, updating one "struct thread_local" may invalidate the cache line that also contains struct thread_local of the other thread, leading to cache misses (aka false sharing [1]) Pad it properly. Make sure each "struct thread_local" is on a separate cache line. This only works with real processors though. Multithread on single processor may be harmed by this change. Keep the tight layout in that case. [1] http://software.intel.com/en-us/articles/avoiding-and-identifying-f= alse-sharing-among-threads/ Signed-off-by: Nguy=E1=BB=85n Th=C3=A1i Ng=E1=BB=8Dc Duy --- My laptop has one physical core with hyperthread, so this patch does no good for me. I won't have access to a multicore system until the middle of next week, so I'm not sure if it actually helps at the moment. builtin/index-pack.c | 37 +++++++++++++++++++++++++++++++++---- 1 files changed, 33 insertions(+), 4 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index c1c3c81..598b18e 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -93,7 +93,8 @@ static int input_fd, output_fd, pack_fd; =20 #ifndef NO_PTHREADS =20 -static struct thread_local *thread_data; +static struct thread_local **thread_data; +static char *thread_data_buffer; =20 static pthread_mutex_t read_mutex; #define read_lock() lock_mutex(&read_mutex) @@ -126,11 +127,38 @@ static inline void unlock_mutex(pthread_mutex_t *= mutex) */ static void init_thread(void) { + int i, stripe; + init_recursive_mutex(&read_mutex); pthread_mutex_init(&counter_mutex, NULL); pthread_mutex_init(&work_mutex, NULL); pthread_key_create(&key, NULL); thread_data =3D xcalloc(nr_threads, sizeof(*thread_data)); + + + if (online_cpus() && !getenv("GIT_ONE_CPU_CORE")) { + /* + * A typical cache line is 64 byte long. Avoid false + * sharing by putting thread data of each thread on a + * separate cache line + */ + stripe =3D 64; + assert(sizeof(struct thread_local) <=3D stripe); + thread_data_buffer =3D xmalloc((nr_threads + 1) * stripe); +#define ALIGN(x, stripe) (((x) + (stripe) - 1) & ~((stripe) - 1)) + thread_data[0] =3D (struct thread_local *)ALIGN((unsigned int)thread= _data_buffer, stripe); + } else { + /* + * On the other hand, on single physical cpu, padding + * wastes cache for nothing. Don't do it. + */ + stripe =3D sizeof(struct thread_local); + thread_data_buffer =3D xmalloc(nr_threads * stripe); + thread_data[0] =3D (struct thread_local *)thread_data_buffer; + } + memset(thread_data[0], 0, stripe * nr_threads); + for (i =3D 1; i < nr_threads; i++) + thread_data[i] =3D (struct thread_local*)((char*)(thread_data[0]) + = i * stripe); threads_active =3D 1; } =20 @@ -143,6 +171,7 @@ static void cleanup_thread(void) pthread_mutex_destroy(&counter_mutex); pthread_mutex_destroy(&work_mutex); pthread_key_delete(key); + free(thread_data_buffer); free(thread_data); } =20 @@ -890,13 +919,13 @@ static void parse_pack_objects(unsigned char *sha= 1) if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) { init_thread(); for (i =3D 0; i < nr_threads; i++) { - int ret =3D pthread_create(&thread_data[i].thread, NULL, - threaded_second_pass, thread_data + i); + int ret =3D pthread_create(&thread_data[i]->thread, NULL, + threaded_second_pass, thread_data[i]); if (ret) die("unable to create thread: %s", strerror(ret)); } for (i =3D 0; i < nr_threads; i++) - pthread_join(thread_data[i].thread, NULL); + pthread_join(thread_data[i]->thread, NULL); =20 cleanup_thread(); return; --=20 1.7.8.36.g69ee2