All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <dstolee@microsoft.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Taylor Blau" <me@ttaylorr.com>, "René Scharfe" <l.s.r@web.de>,
	"Elijah Newren" <newren@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v2 0/5] Optimization batch 12: miscellaneous unthemed stuff
Date: Tue, 01 Jun 2021 14:58:30 +0000	[thread overview]
Message-ID: <pull.962.v2.git.1622559516.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.962.git.1622104642.gitgitgadget@gmail.com>

This series depends on en/ort-perf-batch-11 textually, but is semantically
independent of it.

Changes since v1 (all for the first patch):

 * Add more comments explaining the sorting function, its purpose, and how
   its expected to be called
 * Small style fixup
 * Switch back to using string_list_sort() instead of direct QSORT()

This short series has a few optimizations, but only one of which affects the
testcases of interest (namely, reducing our time spent on sorting an array).
It also fixes a few comments.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.622 s ±  0.059 s     5.235 s ±  0.042 s
mega-renames:     10.127 s ±  0.073 s     9.419 s ±  0.107 s
just-one-mega:   500.3  ms ±  3.8  ms   480.1  ms ±  3.9  ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


Elijah Newren (5):
  merge-ort: replace string_list_df_name_compare with faster alternative
  diffcore-rename: avoid unnecessary strdup'ing in break_idx
  diffcore-rename: enable limiting rename detection to relevant
    destinations
  Fix various issues found in comments
  merge-ort: miscellaneous touch-ups

 diffcore-rename.c                   | 52 ++++++++++++++---
 diffcore.h                          |  2 +
 merge-ort.c                         | 86 +++++++++++++++++++++--------
 t/t6423-merge-rename-directories.sh |  2 +-
 4 files changed, 110 insertions(+), 32 deletions(-)


base-commit: 76e253793c9a1d7fdd1836d5e4db26dabd3d713a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-962%2Fnewren%2Fort-perf-batch-12-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-962/newren/ort-perf-batch-12-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/962

Range-diff vs v1:

 1:  5055dfce3281 ! 1:  c4a0f6a9510c merge-ort: replace string_list_df_name_compare with faster alternative
     @@ Commit message
          "-sort" to either rebase or merge, sorting an array takes more time than
          anything else.  Serves me right for naming my merge strategy this way.
      
     -    Rewrite the comparison function and remove as many levels of indirection
     -    as possible (e.g. the old code had
     -        cmp_items() ->
     -          string_list_df_name_compare() ->
     -            df_name_compare()
     -    now we just have sort_dirs_next_to_their_children()), and tweak it to be
     -    as optimized as possible for our specific case.  These changes reduced
     -    the time spent in "plist special sort" by ~25% in the mega-renames case.
     +    Rewrite the comparison function in a way that does not require finding
     +    out the lengths of the strings when comparing them.  While at it, tweak
     +    the code for our specific case -- no need to handle a variety of modes,
     +    for example.  The combination of these changes reduced the time spent in
     +    "plist special sort" by ~25% in the mega-renames case.
      
          For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
          performance work; instrument with trace2_region_* calls", 2020-10-28),
     @@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
       /*** Function Grouping: functions related to process_entries() ***/
       
      -static int string_list_df_name_compare(const char *one, const char *two)
     -+static int sort_dirs_next_to_their_children(const void *a, const void *b)
     ++static int sort_dirs_next_to_their_children(const char *one, const char *two)
       {
      -	int onelen = strlen(one);
      -	int twolen = strlen(two);
     ++	unsigned char c1, c2;
     ++
       	/*
      -	 * Here we only care that entries for D/F conflicts are
      -	 * adjacent, in particular with the file of the D/F conflict
      -	 * appearing before files below the corresponding directory.
      -	 * The order of the rest of the list is irrelevant for us.
      +	 * Here we only care that entries for directories appear adjacent
     -+	 * to and before files underneath the directory.  In other words,
     -+	 * we do not want the natural sorting of
     ++	 * to and before files underneath the directory.  We can achieve
     ++	 * that by pretending to add a trailing slash to every file and
     ++	 * then sorting.  In other words, we do not want the natural
     ++	 * sorting of
      +	 *     foo
      +	 *     foo.txt
      +	 *     foo/bar
     @@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
      +	 * To achieve this, we basically implement our own strcmp, except that
      +	 * if we get to the end of either string instead of comparing NUL to
      +	 * another character, we compare '/' to it.
     ++	 *
     ++	 * If this unusual "sort as though '/' were appended" perplexes
     ++	 * you, perhaps it will help to note that this is not the final
     ++	 * sort.  write_tree() will sort again without the trailing slash
     ++	 * magic, but just on paths immediately under a given tree.
       	 *
      -	 * To achieve this, we sort with df_name_compare and provide
      -	 * the mode S_IFDIR so that D/F conflicts will sort correctly.
     @@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
      -	 * since in other cases any changes in their order due to
      -	 * sorting cause no problems for us.
      +	 * The reason to not use df_name_compare directly was that it was
     -+	 * just too expensive, so I had to reimplement it.
     ++	 * just too expensive (we don't have the string lengths handy), so
     ++	 * I had to reimplement it.
       	 */
      -	int cmp = df_name_compare(one, onelen, S_IFDIR,
      -				  two, twolen, S_IFDIR);
     --	/*
     ++
     + 	/*
      -	 * Now that 'foo' and 'foo/bar' compare equal, we have to make sure
      -	 * that 'foo' comes before 'foo/bar'.
     --	 */
     ++	 * NOTE: This function will never be called with two equal strings,
     ++	 * because it is used to sort the keys of a strmap, and strmaps have
     ++	 * unique keys by construction.  That simplifies our c1==c2 handling
     ++	 * below.
     + 	 */
      -	if (cmp)
      -		return cmp;
      -	return onelen - twolen;
     -+	const char *one = ((struct string_list_item *)a)->string;
     -+	const char *two = ((struct string_list_item *)b)->string;
     -+	unsigned char c1, c2;
      +
      +	while (*one && (*one == *two)) {
      +		one++;
     @@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
      +	if (c1 == c2) {
      +		/* Getting here means one is a leading directory of the other */
      +		return (*one) ? 1 : -1;
     -+	}
     -+	else
     ++	} else
      +		return c1-c2;
       }
       
     @@ merge-ort.c: static void process_entries(struct merge_options *opt,
       
       	trace2_region_enter("merge", "plist special sort", opt->repo);
      -	plist.cmp = string_list_df_name_compare;
     --	string_list_sort(&plist);
     -+	QSORT(plist.items, plist.nr, sort_dirs_next_to_their_children);
     ++	plist.cmp = sort_dirs_next_to_their_children;
     + 	string_list_sort(&plist);
       	trace2_region_leave("merge", "plist special sort", opt->repo);
       
     - 	trace2_region_leave("merge", "process_entries setup", opt->repo);
 2:  7212816c8d47 = 2:  38713ed48273 diffcore-rename: avoid unnecessary strdup'ing in break_idx
 3:  19150b575058 = 3:  45e1de5fe780 diffcore-rename: enable limiting rename detection to relevant destinations
 4:  98c9a419b313 = 4:  2f26d7e935c0 Fix various issues found in comments
 5:  e85dad887f95 = 5:  7156f26ab299 merge-ort: miscellaneous touch-ups

-- 
gitgitgadget

  parent reply	other threads:[~2021-06-01 14:58 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  8:37 [PATCH 0/5] Optimization batch 12: miscellaneous unthemed stuff Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 1/5] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-05-27 21:00   ` René Scharfe
2021-05-27 22:47     ` Elijah Newren
2021-05-28 16:12       ` René Scharfe
2021-05-28 18:09         ` Elijah Newren
2021-05-28  1:32   ` Taylor Blau
2021-05-28  4:10     ` Elijah Newren
2021-05-27  8:37 ` [PATCH 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 4/5] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 5/5] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-01 14:58 ` Elijah Newren via GitGitGadget [this message]
2021-06-01 14:58   ` [PATCH v2 1/5] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-02 11:29     ` Derrick Stolee
2021-06-01 14:58   ` [PATCH v2 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Elijah Newren via GitGitGadget
2021-06-03 12:54     ` Derrick Stolee
2021-06-03 14:13       ` Elijah Newren
2021-06-01 14:58   ` [PATCH v2 4/5] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 5/5] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-03 12:55   ` [PATCH v2 0/5] Optimization batch 12: miscellaneous unthemed stuff Derrick Stolee
2021-06-04  4:39   ` [PATCH v3 0/4] " Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 1/4] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 2/4] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 3/4] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 4/4] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-04 13:11     ` [PATCH v3 0/4] Optimization batch 12: miscellaneous unthemed stuff Derrick Stolee
2021-06-04 15:48       ` Elijah Newren
2021-06-04 16:30         ` Elijah Newren
2021-06-04 16:35         ` Jeff King
2021-06-04 18:42           ` Derrick Stolee
2021-06-04 19:43             ` Elijah Newren
2021-06-04 19:53             ` Jeff King
2021-06-08 16:11     ` [PATCH v4 " Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 1/4] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 2/4] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 3/4] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 4/4] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.962.v2.git.1622559516.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.