From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
To: Jeff King <peff@peff.net>, Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v3] teach fast-export an --anonymize option
Date: Thu, 28 Aug 2014 17:46:15 +0100 [thread overview]
Message-ID: <53FF5CD7.8040603@ramsay1.demon.co.uk> (raw)
In-Reply-To: <20140828123257.GA18642@peff.net>
On 28/08/14 13:32, Jeff King wrote:
> On Thu, Aug 28, 2014 at 05:30:44PM +0700, Duy Nguyen wrote:
>
>> On Thu, Aug 28, 2014 at 12:01 AM, Jeff King <peff@peff.net> wrote:
>>> You can get an overview of what will be shared
>>> by running a command like:
>>>
>>> git fast-export --anonymize --all |
>>> perl -pe 's/\d+/X/g' |
>>> sort -u |
>>> less
>>>
>>> which will show every unique line we generate, modulo any
>>> numbers (each anonymized token is assigned a number, like
>>> "User 0", and we replace it consistently in the output).
>>
>> I feel like this should be part of git-fast-export.txt, just to
>> increase the user's confidence in the tool (and I don't expect most
>> users to read this commit message).
>
> Hmph. Whenever I say "I think this patch is done", suddenly the comments
> start pouring in. :)
:-D
> I think you are right, though, and we could stand to explain
> the feature a little more in the documentation in general.
> How about this patch on top (or squashed in):
>
> -- >8 --
> Subject: docs/fast-export: explain --anonymize more completely
>
> The original commit made mention of this option, but not why
> one might want it or how they might use it. Let's try to be
> a little more thorough, and also explain how to confirm that
> the output really is anonymous.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Documentation/git-fast-export.txt | 63 ++++++++++++++++++++++++++++++++++++---
> 1 file changed, 59 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
> index 52831fa..dbe9a46 100644
> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -106,10 +106,9 @@ marks the same across runs.
> different from the commit's first parent).
>
> --anonymize::
> - Replace all refnames, paths, blob contents, commit and tag
> - messages, names, and email addresses in the output with
> - anonymized data, while still retaining the shape of history and
> - of the stored tree.
> + Anonymize the contents of the repository while still retaining
> + the shape of the history and stored tree. See the section on
> + `ANONYMIZING` below.
>
> --refspec::
> Apply the specified refspec to each ref exported. Multiple of them can
> @@ -147,6 +146,62 @@ referenced by that revision range contains the string
> 'refs/heads/master'.
>
>
> +ANONYMIZING
> +-----------
> +
> +If the `--anonymize` option is given, git will attempt to remove all
> +identifying information from the repository while still retaining enough
> +of the original tree and history patterns to reproduce some bugs. The
> +goal is that a git bug which is found on a private repository will
s/goal/hope/ ;-)
> +persist in the anonymized repository, and the latter can be shared with
> +git developers to help solve the bug.
> +
> +With this option, git will replace all refnames, paths, blob contents,
> +commit and tag messages, names, and email addresses in the output with
> +anonymized data. Two instances of the same string will be replaced
> +equivalently (e.g., two commits with the same author will have the same
> +anonymized author in the output, but bear no resemblance to the original
> +author string). The relationship between commits, branches, and tags is
> +retained, as well as the commit timestamps (but the commit messages and
> +refnames bear no resemblance to the originals). The relative makeup of
> +the tree is retained (e.g., if you have a root tree with 10 files and 3
> +trees, so will the output), but their names and the contents of the
> +files will be replaced.
> +
> +If you think you have found a git bug, you can start by exporting an
> +anonymized stream of the whole repository:
> +
> +---------------------------------------------------
> +$ git fast-export --anonymize --all >anon-stream
> +---------------------------------------------------
> +
> +Then confirm that the bug persists in a repository created from that
> +stream (many bugs will not, as they really do depend on the exact
> +repository contents):
Dumb question (I have not even read the patch, so please just ignore me
if this is indeed dumb!): Is the map of <original-name, anonymized-name>
available to the user while he attempts to confirm that the bug is still
present?
For example, if I anonymized git.git, and did 'git branch -v' (say), how
easy would it be for me to recognise which branch was 'next'?
ATB,
Ramsay Jones
next prev parent reply other threads:[~2014-08-28 16:46 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-21 7:01 [PATCH] teach fast-export an --anonymize option Jeff King
2014-08-21 20:15 ` Junio C Hamano
2014-08-21 22:41 ` Jeff King
2014-08-21 21:57 ` Junio C Hamano
2014-08-21 22:49 ` Jeff King
2014-08-21 23:21 ` [PATCH v2] " Jeff King
2014-08-22 13:06 ` Duy Nguyen
2014-08-22 18:39 ` Philip Oakley
2014-08-23 6:19 ` Jeff King
2014-08-27 16:01 ` Junio C Hamano
2014-08-27 16:58 ` Jeff King
2014-08-27 17:01 ` [PATCH v3] " Jeff King
2014-08-28 10:30 ` Duy Nguyen
2014-08-28 12:32 ` Jeff King
2014-08-28 16:46 ` Ramsay Jones [this message]
2014-08-28 18:43 ` Junio C Hamano
2014-08-28 18:50 ` Jeff King
2014-08-28 18:11 ` Junio C Hamano
2014-08-28 19:04 ` Jeff King
2014-08-31 10:34 ` Eric Sunshine
2014-08-31 15:53 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53FF5CD7.8040603@ramsay1.demon.co.uk \
--to=ramsay@ramsay1.demon.co.uk \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.