git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* publish from certain commit onward, keeping earlier history private, but provable
@ 2015-12-09 13:45 Jörn Hees
  2015-12-09 17:54 ` Johannes Löthberg
  2015-12-09 22:20 ` Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: Jörn Hees @ 2015-12-09 13:45 UTC (permalink / raw)
  To: git

Hi,

I've been hacking away on a library for quite some time and have a lot of commits in my private repository:

A -> B -> C -> D -> E

Finally, I'm nearing completion of a first version, and want to publish it to a remote called public from D onward keeping A..C to myself, so public should afterwards look like this:

D -> E

My main motivation is that i don't really want to put ridiculously first trials online, but still (on demand) I'd like to be able to prove how i arrived at D (think of copyright claims, etc).

As (at the moment) it's pretty much impossible to reverse-engineer the hashes of commits in the chain with times and changesets, i thought just keeping D's parent pointer to C would be one of the genius advantages of git. Sadly i can't find a way to actually make this work.

Can i somehow push D -> E to public making it a fully functional public repository with all the necessary objects included to checkout D or E and D still pointing to C as parent? If not, why is that?

What doesn't seem to work:

- push with range
  
  git push public D..E:master
  error: src refspec D..E does not match any.
  error: failed to push some refs to '<public>'

- any form of squashing / history rewriting
  
  As far as i know squashing A..D would introduce a new commit removing the parent pointer to C and thereby removing provability of the existence of A..C. (Simple example: say C reversed B, then you'd never be able to prove B was in there at some point.)
  
  I could obviously manually note the hash of C in the description of the squash commit, but there already is a parent pointer field, why not use it?
  
  Also in order to contribute further changes to public I'd have to rebase my private branches on top of this new squashed commit, which just seems as wrong...

- push from local clone with limited depth
  
  I thought i found a solution to this by first creating a local clone local_public with the desired depth before pushing that clone to public like this:
  
  git clone --depth 2 file:///<abspath_private> local_public
  
  With
  
  git log --pretty=raw
  
  I can verify that local_public only contains D -> E and that the commit, tree and parent hashes are the same, which is exactly what i want.
  
  The problem is that when i try to push to an added public remote from local_public i get an error like this:
  
  ! [remote rejected] master -> master (shallow update not allowed)


Any ideas how to make this work?

Cheers,
Jörn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: publish from certain commit onward, keeping earlier history private, but provable
  2015-12-09 13:45 publish from certain commit onward, keeping earlier history private, but provable Jörn Hees
@ 2015-12-09 17:54 ` Johannes Löthberg
  2015-12-09 22:20 ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Löthberg @ 2015-12-09 17:54 UTC (permalink / raw)
  To: Jörn Hees; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

On 09/12, Jörn Hees wrote:
>Hi,
>
>I've been hacking away on a library for quite some time and have a lot 
>of commits in my private repository:
>
>A -> B -> C -> D -> E
>
>Finally, I'm nearing completion of a first version, and want to publish 
>it to a remote called public from D onward keeping A..C to myself, so 
>public should afterwards look like this:
>
>D -> E
>
>My main motivation is that i don't really want to put ridiculously 
>first trials online, but still (on demand) I'd like to be able to prove 
>how i arrived at D (think of copyright claims, etc).
>
>As (at the moment) it's pretty much impossible to reverse-engineer the 
>hashes of commits in the chain with times and changesets, i thought 
>just keeping D's parent pointer to C would be one of the genius 
>advantages of git. Sadly i can't find a way to actually make this work.
>
>Can i somehow push D -> E to public making it a fully functional public 
>repository with all the necessary objects included to checkout D or E 
>and D still pointing to C as parent? If not, why is that?
>

Take a look at git-replace[0][1].

[0]: https://git-scm.com/2010/03/17/replace.html
[1]: https://www.kernel.org/pub/software/scm/git/docs/git-replace.html

-- 
Sincerely,
  Johannes Löthberg
  PGP Key ID: 0x50FB9B273A9D0BB5
  https://theos.kyriasis.com/~kyrias/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1768 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: publish from certain commit onward, keeping earlier history private, but provable
  2015-12-09 13:45 publish from certain commit onward, keeping earlier history private, but provable Jörn Hees
  2015-12-09 17:54 ` Johannes Löthberg
@ 2015-12-09 22:20 ` Jeff King
  2015-12-09 22:24   ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2015-12-09 22:20 UTC (permalink / raw)
  To: Jörn Hees; +Cc: git

On Wed, Dec 09, 2015 at 02:45:44PM +0100, Jörn Hees wrote:

> I've been hacking away on a library for quite some time and have a lot of commits in my private repository:
> 
> A -> B -> C -> D -> E
> 
> Finally, I'm nearing completion of a first version, and want to
> publish it to a remote called public from D onward keeping A..C to
> myself, so public should afterwards look like this:
> 
> D -> E

The short answer is that you cannot do this without changing the names
(i.e., sha1 commit ids) of D and E.

One of the fundamental assumptions git makes is that if a repository has
an object X, it also has all of the objects reachable from it (past
commits, their trees, subtrees, and blobs). This is what makes the
push/fetch object transfer efficient (one side says only "I have X" and
the other side knows "Ah, that is a whole chunk of objects I do not have
to bother sending", without the names of those objects going over the
wire).

The exception, of course, is shallow clones, where one side tells the
other "I am shallow at cutoff point Y; don't assume I have anything
below there". This does work, but there are some downsides (for
instance, we cannot apply some of the same reachability optimizations
for serving fetches).

>   I can verify that local_public only contains D -> E and that the
>   commit, tree and parent hashes are the same, which is exactly what i
>   want.
>   
>   The problem is that when i try to push to an added public remote
>   from local_public i get an error like this:
>   
>   ! [remote rejected] master -> master (shallow update not allowed)

Right. The receiver must be explicitly configured to accept a shallow
push (I do not recall offhand whether clients fetching from you would
also need an explicit config to accept a shallow history).

So the usual path here is to rewrite D and E (with the same trees, but
they will get new commit ids). If you want to retain the older history
(commits A-C), you can distribute it separately and use git-replace to
"graft" it onto the newer history at run-time.

You can do that with:

  # set up a run-time replacement view so that D appears to have
  # no parents; this doesn't impact the objects themselves, but
  # rather git will use our parent-less "replacement" D anytime
  # somebody mentions the original
  git replace --graft D

  # verify that the history is what you want; if you have a non-linear
  # history you may have to make several such "cuts" in the graph
  git log

  # now cement it into place by rewriting
  git filter-branch

Of course that is a bitter pill to swallow if you have reasons for
wanting to use the old sha1s. E.g., you have internal development
proceeding against the old tree and want to share a truncated version
with the public.  In that case I still think the least painful thing is
to rewrite the truncated history, have _everyone_, internal and public
work against that, and let internal folks graft the old history on for
their own use. They can do that with:

  git replace --graft the-rewritten-D the-original-C

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: publish from certain commit onward, keeping earlier history private, but provable
  2015-12-09 22:20 ` Jeff King
@ 2015-12-09 22:24   ` Jeff King
  2015-12-09 22:29     ` Stefan Beller
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2015-12-09 22:24 UTC (permalink / raw)
  To: Jörn Hees; +Cc: git

On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:

> Of course that is a bitter pill to swallow if you have reasons for
> wanting to use the old sha1s. E.g., you have internal development
> proceeding against the old tree and want to share a truncated version
> with the public.

After re-reading your email, it looks like your use case is just to be
able to later prove the existence of the original history. You could
that by mentioning the original "C" in your truncated "D", but in a way
that git does not traverse reachability. For instance, amend D's commit
message to say:

  This is based on earlier, unpublished work going up to commit C.

Then retain C for yourself, and show it only to those you want to prove
its contents to.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: publish from certain commit onward, keeping earlier history private, but provable
  2015-12-09 22:24   ` Jeff King
@ 2015-12-09 22:29     ` Stefan Beller
  2015-12-09 22:50       ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2015-12-09 22:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Jörn Hees, git@vger.kernel.org

On Wed, Dec 9, 2015 at 2:24 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:
>
>> Of course that is a bitter pill to swallow if you have reasons for
>> wanting to use the old sha1s. E.g., you have internal development
>> proceeding against the old tree and want to share a truncated version
>> with the public.
>
> After re-reading your email, it looks like your use case is just to be
> able to later prove the existence of the original history. You could
> that by mentioning the original "C" in your truncated "D", but in a way
> that git does not traverse reachability. For instance, amend D's commit
> message to say:
>
>   This is based on earlier, unpublished work going up to commit C.
>
> Then retain C for yourself, and show it only to those you want to prove
> its contents to.

I'd rather keep D for yourself and create a D' which is D just without
parent and
the note above, such that the tree of D and parts of the commit message
is obvious by looking at D'. All that is secret is Ds parent and the commit
information such as exact date. (committer could be guessed easily)

>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: publish from certain commit onward, keeping earlier history private, but provable
  2015-12-09 22:29     ` Stefan Beller
@ 2015-12-09 22:50       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2015-12-09 22:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jörn Hees, git@vger.kernel.org

On Wed, Dec 09, 2015 at 02:29:12PM -0800, Stefan Beller wrote:

> On Wed, Dec 9, 2015 at 2:24 PM, Jeff King <peff@peff.net> wrote:
> > On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:
> >
> >> Of course that is a bitter pill to swallow if you have reasons for
> >> wanting to use the old sha1s. E.g., you have internal development
> >> proceeding against the old tree and want to share a truncated version
> >> with the public.
> >
> > After re-reading your email, it looks like your use case is just to be
> > able to later prove the existence of the original history. You could
> > that by mentioning the original "C" in your truncated "D", but in a way
> > that git does not traverse reachability. For instance, amend D's commit
> > message to say:
> >
> >   This is based on earlier, unpublished work going up to commit C.
> >
> > Then retain C for yourself, and show it only to those you want to prove
> > its contents to.
> 
> I'd rather keep D for yourself and create a D' which is D just without
> parent and
> the note above, such that the tree of D and parts of the commit message
> is obvious by looking at D'. All that is secret is Ds parent and the commit
> information such as exact date. (committer could be guessed easily)

I think the point is that all of this is happening at time t (let's say
2015), and the proof may be needed at time t+N (let's say 2020).

Showing the original D (or C, or whatever) at that point proves nothing,
as you could have created a fake history in 2020 that "ends up" at the
D' tree. You need to publish _something_ in 2015 that says "I know this
thing, but I am not willing to show it to you yet".

The classic way of doing this is to take out a small ad in the
classified section of a print newspaper with a hash of your data.
Libraries keep archives of the paper, so later you can prove that you
have the data that matches the hash, and its timestamp is certified by
the library archives.

Here we're abusing Git as the notary. If everyone spends the years from
2015-2020 building on top of D', then they can all reasonably agree that
the content of D' was written in 2015, and any commit hash it mentions
had to have existed then. Revealing C (or the original D, or whatever
hash you want to mention) proves the data.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-09 22:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-09 13:45 publish from certain commit onward, keeping earlier history private, but provable Jörn Hees
2015-12-09 17:54 ` Johannes Löthberg
2015-12-09 22:20 ` Jeff King
2015-12-09 22:24   ` Jeff King
2015-12-09 22:29     ` Stefan Beller
2015-12-09 22:50       ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).