From: Felipe Contreras <felipe.contreras@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Michael J Gruber <git@drmicha.warpmail.net>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Sverre Rabbelier <srabbelier@gmail.com>,
Ilari Liusvaara <ilari.liusvaara@elisanet.fi>,
Daniel Barkalow <barkalow@iabervon.org>
Subject: Re: [PATCH v4 00/13] New remote-hg helper
Date: Fri, 2 Nov 2012 19:01:55 +0100 [thread overview]
Message-ID: <CAMP44s1mbNBUspJ8SX=VwGSXthxWAHkrQLFRxzyCzkupLYSagA@mail.gmail.com> (raw)
In-Reply-To: <CAMP44s1P5Y_H24=ZKS5n_rUORf1dTiqg3qXm3bHcOiQ8K12PUQ@mail.gmail.com>
On Fri, Nov 2, 2012 at 5:41 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, Nov 2, 2012 at 3:48 PM, Jeff King <peff@peff.net> wrote:
>> On Thu, Nov 01, 2012 at 05:08:52AM +0100, Felipe Contreras wrote:
>>
>>> > Turns out msysgit's remote-hg is not exporting the whole repository,
>>> > that's why it's faster =/
>>>
>>> It seems the reason is that it would only export to the point where
>>> the branch is checked out. After updating the to the tip I noticed
>>> there was a performance difference.
>>>
>>> I investigated and found two reasons:
>>>
>>> 1) msysgit's version doesn't export files twice, I've now implemented the same
>>> 2) msysgit's version uses a very simple algorithm to find out file changes
>>>
>>> This second point causes msysgit to miss some file changes. Using the
>>> same algorithm I get the same performance, but the output is not
>>> correct.
>>
>> Do you have a test case that demonstrates this? It would be helpful for
>> reviewers, but also helpful to msysgit people if they want to fix their
>> implementation.
>
> Cloning the mercurial repo:
>
> % hg log --stat -r 131
> changeset: 131:c9d51742471c
> parent: 127:44538462d3c8
> user: jake@edge2.net
> date: Sat May 21 11:35:26 2005 -0700
> summary: moving hgweb to mercurial subdir
>
> hgweb.py | 377
> ------------------------------------------------------------------------------------------
> mercurial/hgweb.py | 377
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 377 insertions(+), 377 deletions(-)
>
> % git show --stat 1f9bcfe7cc3d7af7b4533895181acd316ce172d8
> commit 1f9bcfe7cc3d7af7b4533895181acd316ce172d8
> Author: jake@edge2.net <none@none>
> Date: Sat May 21 11:35:26 2005 -0700
>
> moving hgweb to mercurial subdir
>
> mercurial/hgweb.py | 377
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 377 insertions(+)
I talked with some people in #mercurial, and apparently there is a
concept of a 'changelog' that is supposed to store these changes, but
since the format has changed, the content of it is unreliable. That's
not a big problem because it's used mostly for reporting purposes
(log, query), not for doing anything reliable.
To reliably see the changes, one has to compare the 'manifest' of the
revisions involved, which contain *all* the files in them.
That's what I was doing already, but I found a more efficient way to
do it. msysGit is using the changelog, which is quite fast, but not
reliable.
Unfortunately while going trough mercurial's code, I found an issue,
and it turns out that 1) is not correct.
In mercurial, a file hash contains also the parent file nodes, which
means that even if two files have the same content, they would not
have the same hash, so there's no point in keeping track of them to
avoid extracting the data unnecessarily, because in order to make sure
they are different, you need to extract the data anyway, defeating the
purpose.
Which means mercurial doesn't really behave as one would expect:
# add files with the same content
$ echo a > a
$ hg ci -Am adda
adding a
$ echo a >> a
$ hg ci -m changea
$ echo a > a
$ hg st --rev 0
$ hg ci -m reverta
$ hg log -G --template '{rev} {desc}\n'
@ 2 reverta
|
o 1 changea
|
o 0 adda
# check the difference between the first and the last revision
$ hg st --rev 0:2
M a
$ hg cat -r 0 a
a
$ hg cat -r 2 a
a
I will be checking again from where did I get the performance
improvements, but most likely it's from my implementation of
mercurial's repo.status().
Cheers.
--
Felipe Contreras
next prev parent reply other threads:[~2012-11-02 18:02 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-28 3:54 [PATCH v4 00/13] New remote-hg helper Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 01/13] Add new remote-hg transport helper Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 02/13] remote-hg: add support for bookmarks Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 03/13] remote-hg: add support for pushing Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 04/13] remote-hg: add support for remote pushing Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 05/13] remote-hg: add support to push URLs Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 06/13] remote-hg: make sure the encoding is correct Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 07/13] remote-hg: match hg merge behavior Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 08/13] remote-hg: add support for hg-git compat mode Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 09/13] remote-hg: add compat for hg-git author fixes Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 10/13] remote-hg: fake bookmark when there's none Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 11/13] remote-hg: add support for fake remote Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 12/13] remote-hg: add tests to compare with hg-git Felipe Contreras
2012-10-28 3:54 ` [PATCH v4 13/13] remote-hg: add extra author test Felipe Contreras
2012-10-29 8:50 ` [PATCH v4 00/13] New remote-hg helper Jeff King
2012-10-29 14:56 ` Felipe Contreras
2012-10-29 21:26 ` Jeff King
2012-10-29 21:47 ` Felipe Contreras
2012-10-29 21:56 ` Jeff King
2012-10-29 22:02 ` Felipe Contreras
2012-10-29 22:06 ` Jeff King
2012-10-30 17:18 ` Felipe Contreras
2012-10-30 17:20 ` Johannes Schindelin
2012-10-30 18:10 ` Felipe Contreras
2012-10-30 19:33 ` Johannes Schindelin
2012-10-30 20:15 ` Felipe Contreras
2012-10-31 9:30 ` Michael J Gruber
2012-10-31 10:27 ` Jeff King
2012-10-31 15:58 ` Felipe Contreras
2012-10-31 18:20 ` Johannes Schindelin
2012-10-31 18:41 ` Felipe Contreras
2012-10-31 18:59 ` Jonathan Nieder
2012-10-31 19:24 ` Felipe Contreras
2012-10-31 20:28 ` Lack of netiquette, was " Johannes Schindelin
2012-10-31 20:37 ` Felipe Contreras
2012-11-01 1:32 ` Junio C Hamano
2012-11-01 2:58 ` Felipe Contreras
2012-11-01 13:46 ` René Scharfe
2012-11-01 14:18 ` Tomas Carnecky
2012-11-01 14:18 ` Martin Langhoff
2012-11-01 14:34 ` Felipe Contreras
2012-11-01 14:47 ` Martin Langhoff
2012-11-01 17:13 ` Felipe Contreras
2012-11-02 9:38 ` Andreas Ericsson
2012-11-02 11:03 ` Michael J Gruber
2012-11-02 16:09 ` Felipe Contreras
2012-11-05 9:25 ` Michael J Gruber
2012-11-05 15:22 ` Felipe Contreras
2012-11-05 15:58 ` Felipe Contreras
2012-11-05 16:00 ` Michael J Gruber
2012-11-05 16:15 ` Felipe Contreras
2012-11-01 20:46 ` Jonathan Nieder
2012-10-31 23:14 ` Daniel Barkalow
2012-11-01 2:46 ` Felipe Contreras
2012-11-01 1:41 ` Junio C Hamano
2012-11-01 2:54 ` Felipe Contreras
2012-10-31 15:39 ` Felipe Contreras
2012-10-31 15:55 ` Michael J Gruber
2012-10-31 16:11 ` Felipe Contreras
2012-11-02 14:46 ` Jeff King
2012-11-02 18:39 ` Felipe Contreras
2012-11-02 19:20 ` Felipe Contreras
2012-11-04 2:28 ` Felipe Contreras
2012-11-02 23:18 ` Thomas Adam
2012-11-02 23:52 ` Felipe Contreras
2012-10-31 18:04 ` Felipe Contreras
2012-10-31 19:47 ` Felipe Contreras
2012-11-01 4:08 ` Felipe Contreras
2012-11-02 14:48 ` Jeff King
2012-11-02 16:41 ` Felipe Contreras
2012-11-02 18:01 ` Felipe Contreras [this message]
2012-11-05 14:13 ` Michael J Gruber
2012-11-05 15:36 ` Felipe Contreras
2012-11-01 1:22 ` Junio C Hamano
2012-11-01 2:50 ` Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMP44s1mbNBUspJ8SX=VwGSXthxWAHkrQLFRxzyCzkupLYSagA@mail.gmail.com' \
--to=felipe.contreras@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=barkalow@iabervon.org \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ilari.liusvaara@elisanet.fi \
--cc=peff@peff.net \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).