git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* commit-message attack for extracting sensitive data from rewritten Git history
@ 2013-04-07 23:17 Roberto Tyley
  2013-04-08 15:40 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Roberto Tyley @ 2013-04-07 23:17 UTC (permalink / raw)
  To: git

This is a demonstration of a mildly-interesting security concern
relating to Git & git-filter-branch - not a vulnerability in Git
itself, just in the way it can be used. I thought it was interesting
to demonstrate that there is sometimes an avenue of attack for
recovering sensitive data that's been removed from Git history using
git-filter-branch. I think it's a low-severity issue, you may wish to
ignore this, and indeed I've been very politely told already that it's
clearly nonsense :)

Here's an unmodified repo, in which the user unwisely committed a
database password:

https://github.com/bfg-repo-cleaner-demos/gma-demo-repo-original/commit/8c9cfe3c

The unwise commit is reverted with a second commit using 'git revert',
which obviously leaves the password in Git history, and - some time
later - it's decided to properly clean the repo history with
git-filter-branch & git gc, purging the password so the repo can be
more widely shared (open-sourced, or just externally hosted).

git-filter-branch works exactly as intended, purging the password, but
the one thing it does not- typically - do is update the commit
message. So in the cleaned repo, the commit message for the revert
commit still looks like this:

https://github.com/bfg-repo-cleaner-demos/gma-demo-repo-git-filter-branch-cleaned/commit/bf0637a5

It contains a commit id (8c9cfe3) which is no longer in the repo, but
can very easily be associated with an existing commit simply by
examining the subject line of the reverted commit ("Carelessly
checking password into source control"). It's also obvious, from
examining the repo, where the excised data was removed (ie at the
"db.password=" line). At this point it's possible to do a brute-force
attack where you generate possible passwords, insert them into the
available commit's tree, and compare them against the leaked commit
id. When the the commit id matches, the sensitive data has been
recovered.

A proof-of-concept implementation of this attack was indeed able to
recover the purged password:

--
$ java -jar gma-0.1.jar 8c9cfe3c attack-pinpoint
gma-demo-repo-git-filter-branch-cleaned

Brute-force search using these characters : 0123456789abcdefghijklmnopqrstuvwxyz
Available commit, presumed cleaned : 8ebbf661
File path : src/main/resources/config.properties
Template blob : dca1a2fb
Exhausted strings of length 1 or less
...
Exhausted strings of length 4 or less
Match with '0g6rw'
--

So all of this amounts to a fairly low severity issue - people should
always change credentials when they mistakenly commit them to a repo -
but I guess the point is that from a paranoia point of view, you want
to remove all information - including old commit hashes buried in
commit messages - that relate to sensitive data when you clean a repo
for sharing. The git-filter-branch command has a --msg-filter option
which could be used for this purpose, with the application of some
judicious bash-scripting, grep&sed-ing. However, I must confess that I
believe users would be better advised to use The BFG:

http://rtyley.github.io/bfg-repo-cleaner/

The BFG already addresses this issue by replacing all old Git
object-ids found in commit/tag messages with the updated id. For
instance, here's that exact same commit message when cleaned with the
BFG:

https://github.com/bfg-repo-cleaner-demos/gma-demo-repo-bfg-cleaned/commit/35840201

In the case that the users specifies a filtering operation is not
removing 'private' data, the BFG replaces old ids with text of the
form '"newid [formerly oldid]", but if the operation is in fact to
strip private data, the replacement value is simply the newid - and
without the old commit id, the attack described above is not possible.

I believe it's worth educating users to give them a more realistic
understanding of their exposure, and would like to update the
documentation of git-filter-branch to give them a better idea of their
options for removing private data - that would include noting the BFG
as alternative.

- Roberto Tyley

https://github.com/rtyley/bfg-repo-cleaner/blob/v1.2.0/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdSubstitutor.scala#L33-L60

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-09 18:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-07 23:17 commit-message attack for extracting sensitive data from rewritten Git history Roberto Tyley
2013-04-08 15:40 ` Junio C Hamano
2013-04-08 21:54   ` Jeff King
2013-04-09  6:03     ` Johannes Sixt
2013-04-09 17:01       ` Jeff King
2013-04-09 18:08         ` Roberto Tyley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).