From mboxrd@z Thu Jan 1 00:00:00 1970 From: Felipe Contreras Subject: Re: [PATCH v4 00/13] New remote-hg helper Date: Mon, 5 Nov 2012 16:36:56 +0100 Message-ID: References: <5090EFCA.7070606@drmicha.warpmail.net> <509149D9.3070606@drmicha.warpmail.net> <20121102144827.GB11170@sigill.intra.peff.net> <5097C970.9010901@drmicha.warpmail.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Jeff King , Johannes Schindelin , git@vger.kernel.org, Junio C Hamano , Sverre Rabbelier , Ilari Liusvaara , Daniel Barkalow To: Michael J Gruber X-From: git-owner@vger.kernel.org Mon Nov 05 16:37:18 2012 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TVOjm-0002Pf-EC for gcvg-git-2@plane.gmane.org; Mon, 05 Nov 2012 16:37:18 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933156Ab2KEPhD (ORCPT ); Mon, 5 Nov 2012 10:37:03 -0500 Received: from mail-oa0-f46.google.com ([209.85.219.46]:37540 "EHLO mail-oa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932709Ab2KEPg5 (ORCPT ); Mon, 5 Nov 2012 10:36:57 -0500 Received: by mail-oa0-f46.google.com with SMTP id h16so5863444oag.19 for ; Mon, 05 Nov 2012 07:36:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=U3+79U1uS9BVU/5dTowORmAxigm/h/gQpvCCI8Rp+24=; b=ZAgc/iSV1lwsdCcSbXIHrhvh2PTz5cigBCXtguchU8NFVghoeYqS6Ioa8WlFLjuXUR yMgJTh9njYEZqwVZdLAiU+UfSlwWyTqB8JFrtIbFPQyl1+GfeYHqkmRsLzt+I3+UuuGt dDYhlKf3nIY6I0fWqYFm+cDjuS2CbLfhbXzjmPxXQoCROOYXkffpdL7XKBToyOr/qGY5 eWj87kWhFKmw+Zs8HBfzzRoPvBQpwA9aFQYPly1dWT/O+B88HPBZt3rp9uVx4WVikSqv ynxyfaWdj/7C68tIe/ZIPLOl7k8xZ2TwXn9MQ4g1fCfI5tUuReySyLrT+i5dtWGXYs6g kxHA== Received: by 10.60.12.225 with SMTP id b1mr4815685oec.96.1352129817059; Mon, 05 Nov 2012 07:36:57 -0800 (PST) Received: by 10.60.4.74 with HTTP; Mon, 5 Nov 2012 07:36:56 -0800 (PST) In-Reply-To: <5097C970.9010901@drmicha.warpmail.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Nov 5, 2012 at 3:13 PM, Michael J Gruber wrote: > Felipe Contreras venit, vidit, dixit 02.11.2012 19:01: >> I talked with some people in #mercurial, and apparently there is a >> concept of a 'changelog' that is supposed to store these changes, but >> since the format has changed, the content of it is unreliable. That's >> not a big problem because it's used mostly for reporting purposes >> (log, query), not for doing anything reliable. > > Is the changelog stored in the repo (i.e. generated by the hg version at > commit time) or generated on the fly (i.e. generated by the hg version > at hand)? See also below. I don't know. I would expect it to be the former, and then when the format changes, generated by the tool that did the conversion. >> To reliably see the changes, one has to compare the 'manifest' of the >> revisions involved, which contain *all* the files in them. > > 'manifest' == '(exploded) tree', right? Just making sure my hg fu is not > subzero. Yeah, the tree. As I said, it contains all the files. >> That's what I was doing already, but I found a more efficient way to >> do it. msysGit is using the changelog, which is quite fast, but not >> reliable. >> >> Unfortunately while going trough mercurial's code, I found an issue, >> and it turns out that 1) is not correct. >> >> In mercurial, a file hash contains also the parent file nodes, which >> means that even if two files have the same content, they would not >> have the same hash, so there's no point in keeping track of them to >> avoid extracting the data unnecessarily, because in order to make sure >> they are different, you need to extract the data anyway, defeating the >> purpose. > > Do I understand correctly that neither the msysgit version nor yours can > detect duplicate blobs (without requesting them) because of that sha1 issue? That's correct. > I'm really wondering why a file blob hash carries its history along in > the sha1. This appears completely strange to gitters (being brain washed > about "content tracking"), but may be due to hg's extensive use of > delta, or really: delta chains (which do have their merit on the server > side). It is a surprise to me too. I see absolutely no reason why that would be useful. It seems like bazaar does store the file hashes without the parent info, like git. >> Which means mercurial doesn't really behave as one would expect: >> >> # add files with the same content >> >> $ echo a > a >> $ hg ci -Am adda >> adding a >> $ echo a >> a >> $ hg ci -m changea >> $ echo a > a >> $ hg st --rev 0 >> $ hg ci -m reverta >> $ hg log -G --template '{rev} {desc}\n' >> @ 2 reverta >> | >> o 1 changea >> | >> o 0 adda >> >> # check the difference between the first and the last revision >> >> $ hg st --rev 0:2 >> M a >> $ hg cat -r 0 a >> a >> $ hg cat -r 2 a >> a > > That is really scary. What use is "hg stat --rev" then? Not blaming you > for hg, of course. > > On that tangent, I just noticed recently that hg has no python api. > Seriously [1]. They even tell us not to use the internal python api. > msysgit has been lacking support for newer hg, and you've had to add > support for older versions (hg 1.9 will be around on quite some > stable/LTS/EL distro releases) after developing on newer/current ones. > I'm wondering how well that scales in the long term (telling from > git-svn experience: it does not scale well), or whether using some > stable api like 'hgapi' would be a huge bottleneck. I don't know. I have never really used mercurial until recently. I don't know how often they change their APIs and/or repository formats. I would say the burden of updating to newer APIs is probably much less than the burden of implementing code that accesses their repositories directly, and eventually possibly rewriting the code when they change the format. If we were to access the repository directly, I would choose to use Ruby for that, but given that 'we' is increasingly looking like 'I'. I probably wouldn't. Cheers. -- Felipe Contreras