git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: three-way diff performance problem
Date: Tue, 21 Jul 2009 14:01:03 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.01.0907211349570.19335@localhost.localdomain> (raw)
In-Reply-To: <7vd47tbw7q.fsf@alter.siamese.dyndns.org>



On Tue, 21 Jul 2009, Junio C Hamano wrote:
> 
> I know why.  The conversion was wrong.  The original found the last_one
> that was from the parent we are looking at, and when there is such, it
> started the scan from the one _after that_.  Otherwise, if lost_head list
> had an entry
> 
> 	@@ -l,k +m,n @@
>         -one
>          two
>          three
> 
> and the diff with the parent we are currently looking at duplicates
> removal:
> 
> 	@@ -l,k +m1,n1 @@
> 	-one
>         -one
>          two
>          three
> 
> we will end up losing the second removal, which would be what is happening
> with the patch you tried.

Ok, that matches what I saw. Doing a 'diff' between the outputs showed 
missing lines from the result with your first patch.

> I actually was scratching my head wondering why it wasn't happening in the
> original code after I sent that faulty patch.
> 
> Here is another attempt.

Yup. This attempt now generates output that matches the original one (in 
44s), while at the same time still handling the previously very expensive 
case quickly (1.4s).

So it looks good now from my testing - I'm not going to say anything else 
about the patch, since I'm not all that familiar with the code itself.

Btw, the 'perf report' on the fixed build now says

    89.67%               git  /home/torvalds/git/git     [.] xdl_prepare_env
     6.42%               git  /home/torvalds/git/git     [.] xdl_recs_cmp
     0.92%               git  [kernel]                   [k] hpet_next_event
     0.52%               git  /home/torvalds/git/git     [.] xdl_prepare_ctx
     0.44%               git  /lib64/libz.so.1.2.3       [.] inflate_fast
     0.29%               git  /home/torvalds/git/git     [.] xdl_hash_record
     0.27%               git  /lib64/libc-2.10.1.so      [.] __GI_memcmp
     0.26%               git  /home/torvalds/git/git     [.] show_patch_diff
     0.14%               git  [kernel]                   [k] _spin_lock
     0.09%               git  [kernel]                   [k] clear_page_c
     0.09%               git  /lib64/libz.so.1.2.3       [.] adler32

so now the expensive part is xdl_prepare_env. And that does actually make 
sense: it's the code that generates all the hash records for the lines to 
be compared, so there doesn't seem to be anything hugely wrong with it. If 
there are a lot of identical lines, I'd expect that to be fairly 
expensive.

			Linus

      reply	other threads:[~2009-07-21 21:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-21 18:10 three-way diff performance problem Linus Torvalds
2009-07-21 18:16 ` Linus Torvalds
2009-07-21 19:21 ` Junio C Hamano
2009-07-21 19:31   ` Linus Torvalds
2009-07-21 19:47   ` Junio C Hamano
2009-07-21 20:34     ` Linus Torvalds
2009-07-21 20:46       ` Junio C Hamano
2009-07-21 21:01         ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.01.0907211349570.19335@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).