lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	devel@driverdev.osuosl.org,
	Andreas Dilger <andreas.dilger@intel.com>,
	Oleg Drokin <oleg.drokin@intel.com>, NeilBrown <neilb@suse.com>
Cc: Dmitry Eremin <dmitry.eremin@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 24/25] staging: lustre: libcfs: change CPT estimate algorithm
Date: Mon, 16 Apr 2018 00:10:06 -0400	[thread overview]
Message-ID: <1523851807-16573-25-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1523851807-16573-1-git-send-email-jsimmons@infradead.org>

From: Dmitry Eremin <dmitry.eremin@intel.com>

The main idea to have more CPU partitions is based on KNL experience.
When a thread submit IO for network communication one of threads from
current CPT is used for network stack. Whith high parallelization many
threads become involved in network submission but having less CPU
partitions they will wait until single thread process them from network
queue. So, the bottleneck just moves into network layer in case of
small amount of CPU partitions. My experiments showed that the best
performance was when for each IO thread we have one network thread.
This condition can be provided having 2 real HW cores (without hyper
threads) per CPT. This is exactly what implemented in this patch.

Change CPT estimate algorithm from 2 * (N - 1)^2 < NCPUS <= 2 * N^2
to 2 HW cores per CPT. This is critical for machines with number of
cores different from 2^N.

Current algorithm splits CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 16
cpu_partition_table=
    0       : 0-4,68-71,136-139,204-207
    1       : 5-9,73-76,141-144,209-212
    2       : 10-14,78-81,146-149,214-217
    3       : 15-17,72,77,83-85,140,145,151-153,208,219-221
    4       : 18-21,82,86-88,150,154-156,213,218,222-224
    5       : 22-26,90-93,158-161,226-229
    6       : 27-31,95-98,163-166,231-234
    7       : 32-35,89,100-103,168-171,236-239
    8       : 36-38,94,99,104-105,157,162,167,172-173,225,230,235,240-241
    9       : 39-43,107-110,175-178,243-246
    10      : 44-48,112-115,180-183,248-251
    11      : 49-51,106,111,117-119,174,179,185-187,242,253-255
    12      : 52-55,116,120-122,184,188-190,247,252,256-258
    13      : 56-60,124-127,192-195,260-263
    14      : 61-65,129-132,197-200,265-268
    15      : 66-67,123,128,133-135,191,196,201-203,259,264,269-271

New algorithm will split CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 34
cpu_partition_table=
    0       : 0-1,68-69,136-137,204-205
    1       : 2-3,70-71,138-139,206-207
    2       : 4-5,72-73,140-141,208-209
    3       : 6-7,74-75,142-143,210-211
    4       : 8-9,76-77,144-145,212-213
    5       : 10-11,78-79,146-147,214-215
    6       : 12-13,80-81,148-149,216-217
    7       : 14-15,82-83,150-151,218-219
    8       : 16-17,84-85,152-153,220-221
    9       : 18-19,86-87,154-155,222-223
    10      : 20-21,88-89,156-157,224-225
    11      : 22-23,90-91,158-159,226-227
    12      : 24-25,92-93,160-161,228-229
    13      : 26-27,94-95,162-163,230-231
    14      : 28-29,96-97,164-165,232-233
    15      : 30-31,98-99,166-167,234-235
    16      : 32-33,100-101,168-169,236-237
    17      : 34-35,102-103,170-171,238-239
    18      : 36-37,104-105,172-173,240-241
    19      : 38-39,106-107,174-175,242-243
    20      : 40-41,108-109,176-177,244-245
    21      : 42-43,110-111,178-179,246-247
    22      : 44-45,112-113,180-181,248-249
    23      : 46-47,114-115,182-183,250-251
    24      : 48-49,116-117,184-185,252-253
    25      : 50-51,118-119,186-187,254-255
    26      : 52-53,120-121,188-189,256-257
    27      : 54-55,122-123,190-191,258-259
    28      : 56-57,124-125,192-193,260-261
    29      : 58-59,126-127,194-195,262-263
    30      : 60-61,128-129,196-197,264-265
    31      : 62-63,130-131,198-199,266-267
    32      : 64-65,132-133,200-201,268-269
    33      : 66-67,134-135,202-203,270-271

'N' pattern in KNL works is not always good.
in flat mode it will be one CPT with all CPUs inside.

in SNC-4 mode:
cpu_partition_table=
    0       : 0-17,68-85,136-153,204-221
    1       : 18-35,86-103,154-171,222-239
    2       : 36-51,104-119,172-187,240-255
    3       : 52-67,120-135,188-203,256-271

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/24304
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 30 ++++------------------
 1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index 915cfca..ae5fd16 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -768,34 +768,14 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
 
 static int cfs_cpt_num_estimate(void)
 {
-	int nnode = num_online_nodes();
+	int nthr = cpumask_weight(topology_sibling_cpumask(smp_processor_id()));
 	int ncpu = num_online_cpus();
-	int ncpt;
+	int ncpt = 1;
 
-	if (ncpu <= CPT_WEIGHT_MIN) {
-		ncpt = 1;
-		goto out;
-	}
-
-	/* generate reasonable number of CPU partitions based on total number
-	 * of CPUs, Preferred N should be power2 and match this condition:
-	 * 2 * (N - 1)^2 < NCPUS <= 2 * N^2
-	 */
-	for (ncpt = 2; ncpu > 2 * ncpt * ncpt; ncpt <<= 1)
-		;
-
-	if (ncpt <= nnode) { /* fat numa system */
-		while (nnode > ncpt)
-			nnode >>= 1;
+	if (ncpu > CPT_WEIGHT_MIN)
+		for (ncpt = 2; ncpu > 2 * nthr * ncpt; ncpt++)
+			; /* nothing */
 
-	} else { /* ncpt > nnode */
-		while ((nnode << 1) <= ncpt)
-			nnode <<= 1;
-	}
-
-	ncpt = nnode;
-
-out:
 #if (BITS_PER_LONG == 32)
 	/* config many CPU partitions on 32-bit system could consume
 	 * too much memory
-- 
1.8.3.1

  parent reply	other threads:[~2018-04-16  4:10 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-16  4:09 [lustre-devel] [PATCH 00/25] staging: lustre: libcfs: SMP rework James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 01/25] staging: lustre: libcfs: remove useless CPU partition code James Simmons
2018-04-16 13:42   ` Dan Carpenter
2018-04-16  4:09 ` [lustre-devel] [PATCH 02/25] staging: lustre: libcfs: rename variable i to cpu James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 03/25] staging: lustre: libcfs: implement cfs_cpt_cpumask for UMP case James Simmons
2018-04-16 13:51   ` Dan Carpenter
2018-04-16  4:09 ` [lustre-devel] [PATCH 04/25] staging: lustre: libcfs: replace MAX_NUMNODES with nr_node_ids James Simmons
2018-04-16 13:55   ` Dan Carpenter
2018-04-16  4:09 ` [lustre-devel] [PATCH 05/25] staging: lustre: libcfs: remove excess space James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 06/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 07/25] staging: lustre: libcfs: NUMA support James Simmons
2018-04-16 14:27   ` Dan Carpenter
2018-04-16  4:09 ` [lustre-devel] [PATCH 08/25] staging: lustre: libcfs: add cpu distance handling James Simmons
2018-04-16 14:45   ` Dan Carpenter
2018-04-16  4:09 ` [lustre-devel] [PATCH 09/25] staging: lustre: libcfs: use distance in cpu and node handling James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 10/25] staging: lustre: libcfs: provide debugfs files for distance handling James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 11/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print James Simmons
2018-04-17  7:14   ` Dan Carpenter
2018-04-16  4:09 ` [lustre-devel] [PATCH 12/25] staging: lustre: libcfs: fix libcfs_cpu coding style James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 13/25] staging: lustre: libcfs: use int type for CPT identification James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 14/25] staging: lustre: libcfs: rename i to node for cfs_cpt_set_nodemask James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 15/25] staging: lustre: libcfs: rename i to cpu for cfs_cpt_bind James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 16/25] staging: lustre: libcfs: rename cpumask_var_t variables to *_mask James Simmons
2018-04-16  4:09 ` [lustre-devel] [PATCH 17/25] staging: lustre: libcfs: rename goto label in cfs_cpt_table_print James Simmons
2018-04-17  7:34   ` Dan Carpenter
2018-04-16  4:10 ` [lustre-devel] [PATCH 18/25] staging: lustre: libcfs: clear up failure patch in cfs_cpt_*_print James Simmons
2018-04-17  7:39   ` Dan Carpenter
2018-04-16  4:10 ` [lustre-devel] [PATCH 19/25] staging: lustre: libcfs: update debug messages James Simmons
2018-04-16  4:10 ` [lustre-devel] [PATCH 20/25] staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes James Simmons
2018-04-16  4:10 ` [lustre-devel] [PATCH 21/25] staging: lustre: libcfs: report NUMA node instead of just node James Simmons
2018-04-16  4:10 ` [lustre-devel] [PATCH 22/25] staging: lustre: libcfs: update debug messages in CPT code James Simmons
2018-04-16  4:10 ` [lustre-devel] [PATCH 23/25] staging: lustre: libcfs: rework CPU pattern parsing code James Simmons
2018-04-16  4:10 ` James Simmons [this message]
2018-04-16  4:10 ` [lustre-devel] [PATCH 25/25] staging: lustre: libcfs: merge UMP and SMP libcfs cpu header code James Simmons
2018-04-23 12:58 ` [lustre-devel] [PATCH 00/25] staging: lustre: libcfs: SMP rework Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1523851807-16573-25-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=andreas.dilger@intel.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=dmitry.eremin@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.com \
    --cc=oleg.drokin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).