* checking sha1's of files
@ 2009-02-08 9:39 Caleb Cushing
2009-02-08 9:58 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Caleb Cushing @ 2009-02-08 9:39 UTC (permalink / raw)
To: git
I need to check the hashes of specific files in the repo in an
automated fashion, in another tool.
to be less vague currently gentoo's portage tree has manifests for
each file in the tree, on funtoo and regen2 (forks) we've imported the
tree into git. Git has all the manifesting that's needed (most of it,
still doesn't help with files outside the tree) in it. I'd like to be
able to remove manifests from the tree, however I still want to check
that the ebuilds (package format) are consistent at run time. Checking
the entire tree is not sane.
I figure the best way to do this is to first check stat against the
index, then, if that passes check the sha1, if that passes continue to
the next step.
I don't want to do anything like determine the output of a git command
in my code, I'd rather check to see if the check passed or failed
using return codes or some such. If it is capable of checking these
but would require me to parse output I'd still like to know, as it may
let me get the fix in faster, and I can do better later.
I know git may not be currently capable of this behavior, which means
I should extend it, or even write a new program to deal with it. If
this is the case, is there any documentation on how git does this?
aside from the source? could someone point me in the general direction
of source files I should be looking at? maybe even specific functions?
any help with this endeavor of any kind would be appreciated as the
manifests 'cause the repo to balloon, not to mention are just a pain
to manage as they can't actually be merged.
--
Caleb Cushing
http://xenoterracide.blogspot.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: checking sha1's of files
2009-02-08 9:39 checking sha1's of files Caleb Cushing
@ 2009-02-08 9:58 ` Junio C Hamano
2009-02-08 10:53 ` Caleb Cushing
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2009-02-08 9:58 UTC (permalink / raw)
To: Caleb Cushing; +Cc: git
Caleb Cushing <xenoterracide@gmail.com> writes:
> I need to check the hashes of specific files in the repo in an
> automated fashion, in another tool.
What "hash" are you talking about? sha1? md5? crc?
I *think* you are trying to say that gentoo has a tool to compute some
sort of hash for regular files in their source tree by:
"gentoo's portage tree has manifests for each file"
but without knowing what kind of hash they use, I cannot tell you if you
can reuse some part of git to compute their hash without using their tools
(it also is unclear why you are not using their tool to compute their hash
and instead are expecting git to know about the specific hash function
used by them).
For exammple,
"sha1sum Makefile"
would give you the SHA-1 checksum of the contents of the Makefile. Is
that what gentoo's tools expect? If that is the case, that is different
from the blob object name git will give to the contents of that Makefile,
so you cannot reuse much of git.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: checking sha1's of files
2009-02-08 9:58 ` Junio C Hamano
@ 2009-02-08 10:53 ` Caleb Cushing
2009-02-08 11:13 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Caleb Cushing @ 2009-02-08 10:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, Feb 8, 2009 at 4:58 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Caleb Cushing <xenoterracide@gmail.com> writes:
>
>> I need to check the hashes of specific files in the repo in an
>> automated fashion, in another tool.
>
> What "hash" are you talking about? sha1? md5? crc?
just at at a vague glance currently sha1, sha256, and rmd160 if I'm
reading the manifest file correct.
> I *think* you are trying to say that gentoo has a tool to compute some
> sort of hash for regular files in their source tree by:
>
> "gentoo's portage tree has manifests for each file"
>
> but without knowing what kind of hash they use, I cannot tell you if you
> can reuse some part of git to compute their hash without using their tools
> (it also is unclear why you are not using their tool to compute their hash
> and instead are expecting git to know about the specific hash function
> used by them).
I think you misunderstand. I'm not trying to use git to compute their
hash, I'm trying to replace their hash with git. Once I've figured out
how to validate the git hash with 'emerge' (said tool) I will be
removing their hash's.
> For exammple,
>
> "sha1sum Makefile"
>
> would give you the SHA-1 checksum of the contents of the Makefile. Is
> that what gentoo's tools expect?
I'm honestly not sure how all it's calculated, only that the gentoo
tool will have to be modified, only part of the Manifest file that
will be left is the check that is used to validate files not in the
tree (e.g. our distfiles, package tarballs)
> If that is the case, that is different
> from the blob object name git will give to the contents of that Makefile,
> so you cannot reuse much of git.
>
git has it's own internal integrity check right? on the blob's.
I don't intend to make git fit gentoo's system, I intend to make
gentoo (or regen2 rather) use git's integrity system. but integrity
checks are only a small part of the larger tool. I have the feeling
that I will have to write code on both ends to make it work. Gentoo
uses rsync and no git repository, so to validate integrity they use
files with recorded hashes in them. as I understand it git hashes the
files (blobs) internally, so now that we've imported the rsync tree
and started using the 'git' protocol we're really doing the same thing
twice, and recording it more than that.
has this explanation helped?
--
Caleb Cushing
http://xenoterracide.blogspot.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: checking sha1's of files
2009-02-08 10:53 ` Caleb Cushing
@ 2009-02-08 11:13 ` Jeff King
2009-02-08 12:22 ` Caleb Cushing
0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2009-02-08 11:13 UTC (permalink / raw)
To: Caleb Cushing; +Cc: Junio C Hamano, git
On Sun, Feb 08, 2009 at 05:53:31AM -0500, Caleb Cushing wrote:
> > but without knowing what kind of hash they use, I cannot tell you if you
> > can reuse some part of git to compute their hash without using their tools
> > (it also is unclear why you are not using their tool to compute their hash
> > and instead are expecting git to know about the specific hash function
> > used by them).
>
> I think you misunderstand. I'm not trying to use git to compute their
> hash, I'm trying to replace their hash with git. Once I've figured out
> how to validate the git hash with 'emerge' (said tool) I will be
> removing their hash's.
I'm still not sure I entirely understand what you are trying to do, but
these building blocks may help.
You can see git's idea of the hash of a file in history (or in the
index) by asking rev-parse:
# hash of 'Makefile' in the most recent commit on the current branch
$ git rev-parse HEAD:Makefile
27b9569746179e68c635bdaab8e57395f63faf01
# hash of 'Makefile' in the index
$ git rev-parse :Makefile
27b9569746179e68c635bdaab8e57395f63faf01
# hash of 'Makefile' in some arbitrary revision
$ git rev-parse v1.5.1:Makefile
b159ffd0ae49c28725de6549132e0ad3a3b69d20
And you can compute the git blob hash of any file use git hash-object:
$ git hash-object --stdin < Makefile
27b9569746179e68c635bdaab8e57395f63faf01
So if I understand you correctly, you would use the former when
generating your manifests from a revision, and the latter when verifying
the contents of the filesystem against those manifests.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: checking sha1's of files
2009-02-08 11:13 ` Jeff King
@ 2009-02-08 12:22 ` Caleb Cushing
2009-02-08 12:27 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Caleb Cushing @ 2009-02-08 12:22 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
> So if I understand you correctly, you would use the former when
> generating your manifests from a revision, and the latter when verifying
> the contents of the filesystem against those manifests.
that sounds about right actually. except there won't be manifests e.g.
I may actually run
find . -name Manifest -exec rm '{}' +
from the root of the tree. (depends on whether I can get rid of
manifesting distfiles, but they don't change so often, so aren't
really a problem).
> $ git hash-object --stdin < Makefile
> 27b9569746179e68c635bdaab8e57395f63faf01
is there anyway built in way to do that and check to see if the hash
matches HEAD? (before I go and write a string comparison so that it
does)
--
Caleb Cushing
http://xenoterracide.blogspot.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: checking sha1's of files
2009-02-08 12:22 ` Caleb Cushing
@ 2009-02-08 12:27 ` Jeff King
2009-02-08 12:57 ` Caleb Cushing
0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2009-02-08 12:27 UTC (permalink / raw)
To: Caleb Cushing; +Cc: Junio C Hamano, git
On Sun, Feb 08, 2009 at 07:22:31AM -0500, Caleb Cushing wrote:
> > $ git hash-object --stdin < Makefile
> > 27b9569746179e68c635bdaab8e57395f63faf01
>
> is there anyway built in way to do that and check to see if the hash
> matches HEAD? (before I go and write a string comparison so that it
> does)
If you want to know whether a file matches HEAD, just do:
git diff --quiet HEAD -- $LIST_OF_FILES
which will return '0' for no changes or '1' if there are changes.
Which really has nothing to do with hashes at all (though git will use
them internally to avoid actually running a textual diff at all). I was
assuming that you didn't necessarily _have_ the git repository at
verification time. So the hash becomes an easy way of saying "this is
what the file _should_ look like".
But again, I'm not sure I really understand the workflow you're trying
to perform.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: checking sha1's of files
2009-02-08 12:27 ` Jeff King
@ 2009-02-08 12:57 ` Caleb Cushing
0 siblings, 0 replies; 7+ messages in thread
From: Caleb Cushing @ 2009-02-08 12:57 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
> If you want to know whether a file matches HEAD, just do:
>
> git diff --quiet HEAD -- $LIST_OF_FILES
>
> which will return '0' for no changes or '1' if there are changes.
*headdesks* and somehow I think this is exactly what I need.
> Which really has nothing to do with hashes at all (though git will use
> them internally to avoid actually running a textual diff at all). I was
> assuming that you didn't necessarily _have_ the git repository at
> verification time. So the hash becomes an easy way of saying "this is
> what the file _should_ look like".
I love git because it's so powerful, just like the rest of *nix. No
matter how long I use it something always makes me feel like a n00b.
the problem is that I only half understand what git is doing, just
enough to attempt a communication of what I want to do. It doesn't
help that I'm not the expert on the other side of this problem either.
thanks for the help guys, even if I do come off like a moron.
--
Caleb Cushing
http://xenoterracide.blogspot.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-02-08 12:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-08 9:39 checking sha1's of files Caleb Cushing
2009-02-08 9:58 ` Junio C Hamano
2009-02-08 10:53 ` Caleb Cushing
2009-02-08 11:13 ` Jeff King
2009-02-08 12:22 ` Caleb Cushing
2009-02-08 12:27 ` Jeff King
2009-02-08 12:57 ` Caleb Cushing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).