git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git Privacy
@ 2023-07-13 16:27 nick
  2023-07-13 17:11 ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: nick @ 2023-07-13 16:27 UTC (permalink / raw)
  To: git

A couple years ago, I created git-privacy[1]. In it, I explain how
having exact commit times in a Git repo, over a long enough timespan,
can potentially be used to deduce private information about a
developer's life. Then I go on to explain the steps to prevent this
private information leakage.

I know this is low on the list of priorities when it comes to increasing
one's digital privacy, but I think it still matters. It's certainly
relevant to developers who need to remain anonymous while version
controlling their public software.

I was wondering if it would be appropriate to implement a feature which
would allow for automatic obfuscation of Git committer and author
timestamps without the need to assign environment variables or use Git
hooks. Perhaps a config option to automatically set the date to a time
before Git was invented?

Might there a better way to implement these ideas than what I'm
thinking? Please provide some feedback.


References:
1: https://git.nicholasjohnson.ch/git-privacy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-13 16:27 Git Privacy nick
@ 2023-07-13 17:11 ` Junio C Hamano
  2023-07-14  9:22   ` nick
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2023-07-13 17:11 UTC (permalink / raw)
  To: nick; +Cc: git

"nick" <nick@nicholasjohnson.ch> writes:

> hooks. Perhaps a config option to automatically set the date to a time
> before Git was invented?

For some use cases that are outside of how Git was designed to be
used, such configuration might be useful, but I am not yet convinced
that it is worth the engineering effort for this project to review,
accept and maintain changes to implement it.

Just my personal opinion, of course ;-)

After all, if you leave series of commits that stress the fact that
you not just fail to keep, but do deliberately avoid to keep, a
reliable record of when you made your changes, half the value of
keeping your work in source code management system vanishes.  When
somebody comes to your project and says certain parts of your code
were stolen from their proprietary IP, wouldn't you rather be able
to produce the record of who did what at which time to refute their
claim by showing that your project members invented the code long
before they claim they were stolen from them?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-13 17:11 ` Junio C Hamano
@ 2023-07-14  9:22   ` nick
  2023-07-14 16:45     ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: nick @ 2023-07-14  9:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> "nick" <nick@nicholasjohnson.ch> writes:
>
> > hooks. Perhaps a config option to automatically set the date to a time
> > before Git was invented?
>
> [...] I am not yet convinced that it is worth the engineering effort
> for this project to review, accept and maintain changes to implement
> it.

Upon further thought, given that it's already pretty easy to accomplish
timestamp obfuscation, albeit clumsy, I concede that it may not be worth
the engineering effort to implement my original suggestion. So I'll drop
it.

However, I think it is worth the effort for the time zones. Is there any
reason Git doesn't automatically convert local time to UTC in timestamps
to prevent leaking the developer's time zone?

It seems like a simple change that would be good for the developer's
privacy without harming Git in any way. It would also be easy to
implement as backwards-compatible.

I've been told this idea was already mentioned, but it has been ignored
for some time:

https://git.issues.gerritcodereview.com/issues/40000039

The sooner it's addressed, the better since it means less personal
information leakage.

> After all, if you leave series of commits that stress the fact that
> you not just fail to keep, but do deliberately avoid to keep, a
> reliable record of when you made your changes, half the value of
> keeping your work in source code management system vanishes. When
> somebody comes to your project and says certain parts of your code
> were stolen from their proprietary IP, wouldn't you rather be able
> to produce the record of who did what at which time to refute their
> claim by showing that your project members invented the code long
> before they claim they were stolen from them?

Thank you for bringing this up. This was not an angle I considered when
writing my repo git-privacy, but now I'll definitely warn about it there.

Your feedback above would not apply to the UTC time zone proposal I
linked to though. There is a good reason to implement it and, as far as
I can think of, no reason not to.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-14  9:22   ` nick
@ 2023-07-14 16:45     ` Junio C Hamano
  2023-07-15  4:32       ` nick
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2023-07-14 16:45 UTC (permalink / raw)
  To: nick; +Cc: git

"nick" <nick@nicholasjohnson.ch> writes:

>> "nick" <nick@nicholasjohnson.ch> writes:
>>
>> > hooks. Perhaps a config option to automatically set the date to a time
>> > before Git was invented?
>>
>> [...] I am not yet convinced that it is worth the engineering effort
>> for this project to review, accept and maintain changes to implement
>> it.
>
> Upon further thought, given that it's already pretty easy to accomplish
> timestamp obfuscation, albeit clumsy, I concede that it may not be worth
> the engineering effort to implement my original suggestion. So I'll drop
> it.
>
> However, I think it is worth the effort for the time zones. Is there any
> reason Git doesn't automatically convert local time to UTC in timestamps
> to prevent leaking the developer's time zone?

Actually it is the other way around, if I understand correctly.

Git could have been designed to discard that information like
previous version control systems, but it is another piece of
interesting information and made a conscious design decision to keep
it.  In other words, "is there any reason why we do not discard the
information?" is a wrong question to ask in the context of VCS.

I earlier said I am not yet convinced it is worth our time, and so
far I haven't heard anything new that may help me convince myself
yet.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-14 16:45     ` Junio C Hamano
@ 2023-07-15  4:32       ` nick
  2023-07-16 11:47         ` René Scharfe
  2023-07-16 23:07         ` nick
  0 siblings, 2 replies; 17+ messages in thread
From: nick @ 2023-07-15  4:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> "nick" <nick@nicholasjohnson.ch> writes:
>
> > However, I think it is worth the effort for the time zones. Is there any
> > reason Git doesn't automatically convert local time to UTC in timestamps
> > to prevent leaking the developer's time zone?
>
> Actually it is the other way around, if I understand correctly.
>
> Git could have been designed to discard that information like
> previous version control systems, but it is another piece of
> interesting information and made a conscious design decision to keep
> it. In other words, "is there any reason why we do not discard the
> information?" is a wrong question to ask in the context of VCS.

I'll make my best case one last time and if it doesn't convince you,
then I have nothing else to offer.

Git leaks private information about developers publicly by design
through its precise timestamps. You mentioned this makes it easier to
deny copyright claims, but one could get more or less the same benefit
without sacrificing privacy by rounding commit times to the nearest day.
I'm not advocating making this behavior the default, just that
developers be given the option to do it.

The time zones reveal private information about developers and they
don't even serve a use case, as far as I'm aware. A backwards-compatible
way to solve this leak would be to convert timestamps to UTC by default
and have a Git config option to revert back to the current behavior.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-15  4:32       ` nick
@ 2023-07-16 11:47         ` René Scharfe
  2023-07-16 22:52           ` nick
  2023-07-17  2:36           ` Junio C Hamano
  2023-07-16 23:07         ` nick
  1 sibling, 2 replies; 17+ messages in thread
From: René Scharfe @ 2023-07-16 11:47 UTC (permalink / raw)
  To: nick, Junio C Hamano; +Cc: git

Am 15.07.23 um 06:32 schrieb nick:
> Junio C Hamano wrote:
>> "nick" <nick@nicholasjohnson.ch> writes:
>>
>>> However, I think it is worth the effort for the time zones. Is there any
>>> reason Git doesn't automatically convert local time to UTC in timestamps
>>> to prevent leaking the developer's time zone?
>>
>> Actually it is the other way around, if I understand correctly.
>>
>> Git could have been designed to discard that information like
>> previous version control systems, but it is another piece of
>> interesting information and made a conscious design decision to keep
>> it. In other words, "is there any reason why we do not discard the
>> information?" is a wrong question to ask in the context of VCS.
>
> I'll make my best case one last time and if it doesn't convince you,
> then I have nothing else to offer.
>
> Git leaks private information about developers publicly by design
> through its precise timestamps. You mentioned this makes it easier to
> deny copyright claims, but one could get more or less the same benefit
> without sacrificing privacy by rounding commit times to the nearest day.
> I'm not advocating making this behavior the default, just that
> developers be given the option to do it.
>
> The time zones reveal private information about developers and they
> don't even serve a use case, as far as I'm aware. A backwards-compatible
> way to solve this leak would be to convert timestamps to UTC by default
> and have a Git config option to revert back to the current behavior.

I get it to some extent: timezone and timestamps are personal data, which
may only be collected and processed for a lawful purpose according to the
GDPR.  Git works just as well with timestamps that omit time of day and
timezone, so is there a valid reason to collect that information?  At
least that's how I understand it, and I'm certainly not a lawyer.

But Git is not a legal entity, it's just a command line program that you,
the data subject, control.  You can use the  option --date or the
environment variable GIT_AUTHOR_DATE to set the author timestamp and the
variable GIT_COMMITTER_DATE to set the committer timestamp on commit.
Not sure why there is no command line option for the latter, hmm.

So I see this more as a usability issue.  Git allows its users to tailor
commits to suit their needs in many ways.  You can edit file contents,
history and metadata.  For timestamp and timezone this isn't as
convenient as it could be.  If git commit has a --signoff option that
can be enabled by default then adding config options for controlling
timestamp granularity is hard to say no to.

René


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-16 11:47         ` René Scharfe
@ 2023-07-16 22:52           ` nick
  2023-07-17  2:36           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: nick @ 2023-07-16 22:52 UTC (permalink / raw)
  To: René Scharfe, Junio C Hamano; +Cc: git

René Scharfe wrote:
> schrieb nick:
> > Git leaks private information about developers publicly by design
> > through its precise timestamps. You mentioned this makes it easier to
> > deny copyright claims, but one could get more or less the same benefit
> > without sacrificing privacy by rounding commit times to the nearest day.
> > I'm not advocating making this behavior the default, just that
> > developers be given the option to do it.
>
> I get it to some extent: timezone and timestamps are personal data,
> which may only be collected and processed for a lawful purpose according to
> the GDPR.

> But Git is not a legal entity, it's just a command line program that you,
> the data subject, control.

As far as I know and I'm not a lawyer either, there are no legal issues
related to this. To be clear, my argument is more a moral one, not a
legal one.

> So I see this more as a usability issue. Git allows its users to tailor
> commits to suit their needs in many ways. You can edit file contents,
> history and metadata. For timestamp and timezone this isn't as
> convenient as it could be. If git commit has a --signoff option that
> can be enabled by default then adding config options for controlling
> timestamp granularity is hard to say no to.

You're right that usability is not as good as it could be for those who
want more privacy.

Many of the i2p devs are known only under pseudonyms. They definitely
don't want their timezones leaked while developing i2p. I imagine that
they have other repos that they develop non-anonymously. They could
create a separate shell alias for Git with coarse-grained timestamps and
no timezone, but it would still require a lot of mental bookkeeping to
remember to use the alias every time. A single mistake would leak their
timezone.

Git could solve this by offering a per-repo option that controls
timestamp granularity.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-15  4:32       ` nick
  2023-07-16 11:47         ` René Scharfe
@ 2023-07-16 23:07         ` nick
  2023-07-16 23:27           ` Jason Pyeron
  2023-07-18 21:59           ` brian m. carlson
  1 sibling, 2 replies; 17+ messages in thread
From: nick @ 2023-07-16 23:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

nick wrote:
> The time zones reveal private information about developers and they
> don't even serve a use case, as far as I'm aware. A backwards-compatible
> way to solve this leak would be to convert timestamps to UTC by default
> and have a Git config option to revert back to the current behavior.

Come to think of it, even if timezones were converted to UTC by default,
time of day would still leak information about a user's likely timezone.

So based on that and keeping in mind Git's desire for strong
backwards-compatibility, I'm amending my proposal to just a standalone
Git option which would allow for forging timestamp and timezone
information, with timestamp information being forgeable to varying
degrees of granularity.

A new Git option is appropriate because Git doesn't already have
features which make this possible. So it would be necessary to implement
a new option anyways.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Git Privacy
  2023-07-16 23:07         ` nick
@ 2023-07-16 23:27           ` Jason Pyeron
  2023-07-17  4:20             ` nick
  2023-07-18 21:59           ` brian m. carlson
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Pyeron @ 2023-07-16 23:27 UTC (permalink / raw)
  To: 'nick', 'Junio C Hamano'; +Cc: git

> -----Original Message-----
> From: nick 
> Sent: Sunday, July 16, 2023 7:07 PM
> Subject: Re: Git Privacy
> 
> nick wrote:
> > The time zones reveal private information about developers and they
> > don't even serve a use case, as far as I'm aware. A backwards-compatible
> > way to solve this leak would be to convert timestamps to UTC by default
> > and have a Git config option to revert back to the current behavior.
> 
> Come to think of it, even if timezones were converted to UTC by default,
> time of day would still leak information about a user's likely timezone.

Discussed this with our policy wonks...

Short answer - no. There is no legal assumption that can be made - your work hours cannot be assumed to be 9-5. They also said that time zone is "too broad at 1/24th of the world", but understood the concern.

That being said the recommendation is to add --privacy

Where it assumes some defaults and those defaults can be controlled in your config or via --privacy=option1,option2 

And then some of the options can be:

date-timezone=UTC

date-precision=8hour

etc...

v/r,

Jason Pyeron

--
Jason Pyeron  | Architect
Contractor    |
PD Inc        | Certified SBA 8(a)
10 w 24th St  | Certified SBA HUBZone
Baltimore, MD | CAGE Code: 1WVR6

.mil: jason.j.pyeron.ctr@mail.mil
.com: jpyeron@pdinc.us
tel : 202-741-9397



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-16 11:47         ` René Scharfe
  2023-07-16 22:52           ` nick
@ 2023-07-17  2:36           ` Junio C Hamano
  2023-07-17  2:57             ` Junio C Hamano
  2023-07-17 16:37             ` Junio C Hamano
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2023-07-17  2:36 UTC (permalink / raw)
  To: René Scharfe; +Cc: nick, git

René Scharfe <l.s.r@web.de> writes:

> But Git is not a legal entity, it's just a command line program that you,
> the data subject, control.  You can use the  option --date or the
> environment variable GIT_AUTHOR_DATE to set the author timestamp and the
> variable GIT_COMMITTER_DATE to set the committer timestamp on commit.
> Not sure why there is no command line option for the latter, hmm.

For two reasons.

 * While using the GIT_AUTHOR_DATE environment variable is perfectly
   adequate (after all, we did not have the option before Git 1.7.0,
   released in Feb 2010), overriding the author time with "--date"
   had a good reason to exist, unlike the committer timestamp.

   Imagine you were relayed somebody else's changes, not via a
   format that is kosher and acceptable by "git am", but somehow
   managed to reproduce in your working tree.  If you also have
   learned when and in which timezone the original author made that
   change, you'd want to have a way to record it.

 * Having a system clock that can randomly go backwards and using
   such a system clock to record the committer timestamp, has broken
   "git log" in mergy-bushy histories.  This issue has been somewhat
   mitigated by introduction of generation numbers, but traversing
   the commits in the newer part of the history that are not yet
   covered by commit-graph would be affected if you let your commit
   timestamps go back and force deliberately.

> So I see this more as a usability issue.  Git allows its users to tailor
> commits to suit their needs in many ways.  You can edit file contents,
> history and metadata.  For timestamp and timezone this isn't as
> convenient as it could be.

I think the existing two environment variables are very good place
to draw the line.  When we start talking about "privacy", just like
"security", the exact details of the design and the implementation
would affect the resulting quality of the "privacy enhancing
features", but our primary mission is source code control and we are
not equipped to even measure how good our implementation would be.

Just like we do not pretend to be security engineers and do not
invent our own implementations of the hash functions and secure
network transports (instead we let third-parties to implement them
and just use them), we should NOT be adding a "--privacy" option
that picks rand(24)*60 as UTC offset and pretends that it the
timezone of the author, and picks some random timestamp between the
timestamp of the latest commit in the repository and the actual
wallclock timestamp and pretends that is the author time.  After
all, our project is not about coming up with a quality time
obfusucation.

But the good thing is that privacy-minded folks can write a quality
implementation of a much better design to lie about the timezone and
the current time, preferrably (but not absolutely necessary) within
the constraints that the time should not go backwards, which would
help Git.  Once such an external program is written, the users can
arrange that the program is called every time the shell gives the
control back to the user to set its output to GIT_AUTHOR_DATE.  Zsh
has precmd mechansim that you can use to invoke such a mechanism
before each prompt; bash has PROMPT_COMMAND that can used in a
similar way.

Needless to say, such a "privacy enhancing `date` command" can be
used outside the context of Git, too.  My point is that it is not
within the scope of this project to add an internal implementation
of such a command and drive that from a command line option or a
configuration variable.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-17  2:36           ` Junio C Hamano
@ 2023-07-17  2:57             ` Junio C Hamano
  2023-07-17  5:36               ` nick
  2023-07-17 16:37             ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2023-07-17  2:57 UTC (permalink / raw)
  To: René Scharfe; +Cc: nick, git

Junio C Hamano <gitster@pobox.com> writes:

> and just use them), we should NOT be adding a "--privacy" option
> that picks rand(24)*60 as UTC offset and pretends that it the
> timezone of the author, and picks some random timestamp between the
> timestamp of the latest commit in the repository and the actual
> wallclock timestamp and pretends that is the author time.  After
> all, our project is not about coming up with a quality time
> obfusucation.

We could go to the extreme in the complete opposite, if we do not
care about the quality of the "privacy" feature, and you could
probably talk me into adopting below as long as the option or the
configuration are not named with the word "privacy" in them (a
"--useless-time" option, or a "core.uselesstime" configuration
variable, are OK).

When the feature is in effect, all timestamps in commit and tag
objects pretend to be in UTC timezone, and

 (1) the commits record the Epoch as its timestamps if there is no
     parent;

 (2) the commits record one second after the largest of the
     timestamps as its timestamps of all its parents;

 (3) in any case, the same (phoney) timestamp is used for author and
     committer.

 (4) the tags record the Epoch as its timestamp if they point at
     trees or blobs.

 (5) the tags record one second after the largest timestamp of
     pointee as their timestamp, if they point at tags or commits.

 (6) as the reflog is a local matter, its timestamp may be local,
     but it is OK if it ends up being just a useless number if that
     is more convenient to implement.

The resulting history will be shouting that "I am privacy conscious
and hiding my activities behind a fake clock" in capital letters,
which I would not call a quality design of a privacy feature, but it
does completely dissociate the wallclock time from the recorded
history without breaking the monotonicity of timestamps in the
recorded history.

When the useless-time feature is in use, you cannot expect features
like "git log --since" would work sensibly, but that is a given, I
would guess.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-16 23:27           ` Jason Pyeron
@ 2023-07-17  4:20             ` nick
  0 siblings, 0 replies; 17+ messages in thread
From: nick @ 2023-07-17  4:20 UTC (permalink / raw)
  To: Jason Pyeron, 'Junio C Hamano'; +Cc: git

Jason Pyeron wrote:
> > nick wrote:
> > Come to think of it, even if timezones were converted to UTC by default,
> > time of day would still leak information about a user's likely timezone.
>
> Discussed this with our policy wonks...
>
> Short answer - no. There is no legal assumption that can be made - your
> work hours cannot be assumed to be 9-5. They also said that time zone is
> "too broad at 1/24th of the world", but understood the concern.

An adversary may have other information which can be correlated with the
timestamps or timezone, making them less benign than in isolation.

> That being said the recommendation is to add --privacy

I'm not familiar with the processes here. Is it my responsibility to
implement it since I proposed it or who shall implement it?

> Where it assumes some defaults and those defaults can be controlled in
> your config or via --privacy=option1,option2
>
> And then some of the options can be:
>
> date-timezone=UTC
>
> date-precision=8hour

This sounds great. A few preliminary ideas on implementation:

'date-precision' must round the author AND committer timestamps
otherwise it's useless

'date-precision' must round down, never into the future

'date-timezone' must convert the date from local time and not just
replace the timezone

Any thoughts on making 'date-precision' also apply to GnuPG signature
timestamps? It's possible to specify a custom GnuPG command which does
this using gpg.program, but it's inconvenient. The relevant GnuPG option
is '--faked-system-time <epoch>!'

If that idea is no good, there should at least be a warning displayed
when the user signs anything with GnuPG with 'date-precision' enabled.

If that idea is good, then there should be a conditional check that the
rounding performed by 'date-precision' does not round down to before the
signing key was generated. Otherwise the signature will be invalid.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-17  2:57             ` Junio C Hamano
@ 2023-07-17  5:36               ` nick
  2023-07-17 20:57                 ` Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: nick @ 2023-07-17  5:36 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe; +Cc: git

Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > and just use them), we should NOT be adding a "--privacy" option
> > that picks rand(24)*60 as UTC offset and pretends that it the
> > timezone of the author, and picks some random timestamp between the
> > timestamp of the latest commit in the repository and the actual
> > wallclock timestamp and pretends that is the author time.  After
> > all, our project is not about coming up with a quality time
> > obfusucation.
>
> We could go to the extreme in the complete opposite, if we do not
> care about the quality of the "privacy" feature, and you could
> probably talk me into adopting below as long as the option or the
> configuration are not named with the word "privacy" in them (a
> "--useless-time" option, or a "core.uselesstime" configuration
> variable, are OK).

I hadn't considered it in my other responses, but calling it --privacy
would be a bad idea for exactly the reasons you laid out. Calling it
--useless-time would be better.

> When the feature is in effect, all timestamps in commit and tag
> objects pretend to be in UTC timezone, and
>
> (1) the commits record the Epoch as its timestamps if there is no
> parent;
>
> (2) the commits record one second after the largest of the
> timestamps as its timestamps of all its parents;
>
> (3) in any case, the same (phoney) timestamp is used for author and
> committer.
>
> (4) the tags record the Epoch as its timestamp if they point at
> trees or blobs.
>
> (5) the tags record one second after the largest timestamp of
> pointee as their timestamp, if they point at tags or commits.
>
> (6) as the reflog is a local matter, its timestamp may be local,
> but it is OK if it ends up being just a useless number if that
> is more convenient to implement.

You're the expert on Git's internals and clearly know best how to
implement this with the least amount of breakage. So I can't comment on
that.

I will say these points seem to be sufficient to satisfy the privacy use
case. I don't think any more can reasonably be expected.

> The resulting history will be shouting that "I am privacy conscious
> and hiding my activities behind a fake clock" in capital letters,
> which I would not call a quality design of a privacy feature, but it
> does completely dissociate the wallclock time from the recorded
> history without breaking the monotonicity of timestamps in the
> recorded history.

Depending on one's threat model, revealing the fact that one is using a
privacy feature/tool isn't necessarily a problem. I agree that perhaps a
really high-quality implementation of a privacy feature could do this,
but I think that's outside the scope and way too much to expect from
devs as you said.

> When the useless-time feature is in use, you cannot expect features
> like "git log --since" would work sensibly, but that is a given, I
> would guess.

There could be a warning in the documentation that this feature may
cause breakage.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-17  2:36           ` Junio C Hamano
  2023-07-17  2:57             ` Junio C Hamano
@ 2023-07-17 16:37             ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2023-07-17 16:37 UTC (permalink / raw)
  To: René Scharfe; +Cc: nick, git

Junio C Hamano <gitster@pobox.com> writes:

> René Scharfe <l.s.r@web.de> writes:
>
>> Not sure why there is no command line option for the latter, hmm.
>
> For two reasons.
>
>  * While using the GIT_AUTHOR_DATE environment variable is perfectly
>    adequate (after all, we did not have the option before Git 1.7.0,
>    released in Feb 2010), overriding the author time with "--date"
>    had a good reason to exist, unlike the committer timestamp.
>
>    Imagine you were relayed somebody else's changes, not via a
>    format that is kosher and acceptable by "git am", but somehow
>    managed to reproduce in your working tree.  If you also have
>    learned when and in which timezone the original author made that
>    change, you'd want to have a way to record it.

The point here is that tweaking the author time has a general
utility that is wider than "I want to use a timestamp that is
disconnected to the reality"---in fact, it is quite the opposite.
The above example is using the option as a tool to record what
actually happened in reality and not about using a fake time at
which nothing related to the resulting commit happened.  That is why
I said that it "has a good reason to exist".

Contrasted to this, the committer timestamp is about when the commit
was created, and there is no need to have an easy access from the
command line like "--date" option does to tweak it [*].  It is
updated to the current time when you "commit --amend" for exactly
the same reason.

> I think the existing two environment variables are very good place
> to draw the line.
> ...
> Needless to say, such a "privacy enhancing `date` command" can be
> used outside the context of Git, too.  My point is that it is not
> within the scope of this project to add an internal implementation
> of such a command and drive that from a command line option or a
> configuration variable.

And I still think this is a reasonable way forward.  We have offered
two environment variables for their use and it is up to them to use
it when driving "git" binary from their environment.  Anything more
is a distraction to this project, I would think.


[Footnote]

 * As long as your system clock is reasonably accurate that does not
   need constant tweaking, that is.  If not, you have a more serious
   problem than tweaking the string on the committer line---none of
   the timestamp based heuristics like "make" not rebuilding things
   unnecessarily would be broken on your system.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-17  5:36               ` nick
@ 2023-07-17 20:57                 ` Theodore Ts'o
  2023-07-17 22:49                   ` nick
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2023-07-17 20:57 UTC (permalink / raw)
  To: nick; +Cc: Junio C Hamano, René Scharfe, git

On Mon, Jul 17, 2023 at 05:36:48AM +0000, nick wrote:
> 
> I hadn't considered it in my other responses, but calling it --privacy
> would be a bad idea for exactly the reasons you laid out. Calling it
> --useless-time would be better.

It might also be worth pointing out that someone still might be able
to figure out information from when a branch gets pushed to a git
repo.  Even if the time in the timestamp is randomized, when someone
sends a pull request to github is not going to be randomize.  Or if
someone pushes their branch to github, and github actions is set up to
automatically kick off regression tests as soon as the branch changes,
this can also leak information about when the push happened.

There are also integration test systems, such as the gce-xfstests's
lightweight test manager, which polls the branch every 15 minutes, and
the moment the branch changes, tests immediately start running and the
timestamp when the test was kicked off is encoded in the testrunid.

Which is why, quite frankly, I'm a bit dubious about the whole "I must
obfuscate the time zone from which I am operating", as something
that's really worth the effort, since it has a lot of downsides, and
if the user is not careful, they may end up leaking information about
when they are active anyway....

					- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-17 20:57                 ` Theodore Ts'o
@ 2023-07-17 22:49                   ` nick
  0 siblings, 0 replies; 17+ messages in thread
From: nick @ 2023-07-17 22:49 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Junio C Hamano, René Scharfe, git

Theodore Ts'o wrote:
> It might also be worth pointing out that someone still might be able
> to figure out information from when a branch gets pushed to a git
> repo.

Github and other forges are actually known to track this information.

> [...]
>
> Which is why, quite frankly, I'm a bit dubious about the whole "I must
> obfuscate the time zone from which I am operating", as something
> that's really worth the effort, since it has a lot of downsides, and
> if the user is not careful, they may end up leaking information about
> when they are active anyway....

I'm fine with the argument against that it causes breakage, but I
disagree with the idea that it shouldn't be implemented because "they
may end up leaking information about when they are active anyway".

That is a defeatist argument that applies to many privacy technologies
that exist. Taken to its logical conclusion, it says "Let's not try to
improve privacy ever because the same information may be obtainable in
other ways." One has to start somewhere.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Git Privacy
  2023-07-16 23:07         ` nick
  2023-07-16 23:27           ` Jason Pyeron
@ 2023-07-18 21:59           ` brian m. carlson
  1 sibling, 0 replies; 17+ messages in thread
From: brian m. carlson @ 2023-07-18 21:59 UTC (permalink / raw)
  To: nick; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 1723 bytes --]

On 2023-07-16 at 23:07:06, nick wrote:
> nick wrote:
> > The time zones reveal private information about developers and they
> > don't even serve a use case, as far as I'm aware. A backwards-compatible
> > way to solve this leak would be to convert timestamps to UTC by default
> > and have a Git config option to revert back to the current behavior.
> 
> Come to think of it, even if timezones were converted to UTC by default,
> time of day would still leak information about a user's likely timezone.

This is true.  My .signature indicates where I'm located (which isn't a
secret), but I have `TZ=UTC` set in my shell config.  You'll notice that
my timestamp is +0000 in all my commits.  I keep a reasonably regular
daytime schedule, so it's easy to tell what my hours are.

> So based on that and keeping in mind Git's desire for strong
> backwards-compatibility, I'm amending my proposal to just a standalone
> Git option which would allow for forging timestamp and timezone
> information, with timestamp information being forgeable to varying
> degrees of granularity.

One thing I've wanted Git to do (which I'm not sure is backwards
compatible) is to set the timezone to -0000 (instead of +0000) to
indicate that the user has intentionally refused to set the timezone,
much like the equivalent syntax in RFC 5322.  I think that's a fine
choice for lots of reasons, but it prevents people from accidentally
concluding that I live in Reykjavík and expecting a response from me
when I'm actually in bed.

I'd support a command-line and config option that did that, in addition
to an option that adjusted the timezone.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-07-18 21:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-13 16:27 Git Privacy nick
2023-07-13 17:11 ` Junio C Hamano
2023-07-14  9:22   ` nick
2023-07-14 16:45     ` Junio C Hamano
2023-07-15  4:32       ` nick
2023-07-16 11:47         ` René Scharfe
2023-07-16 22:52           ` nick
2023-07-17  2:36           ` Junio C Hamano
2023-07-17  2:57             ` Junio C Hamano
2023-07-17  5:36               ` nick
2023-07-17 20:57                 ` Theodore Ts'o
2023-07-17 22:49                   ` nick
2023-07-17 16:37             ` Junio C Hamano
2023-07-16 23:07         ` nick
2023-07-16 23:27           ` Jason Pyeron
2023-07-17  4:20             ` nick
2023-07-18 21:59           ` brian m. carlson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).