From: simon@hollie.ento.csiro.au (Simon Fowler)
To: Chris Mason <mason@suse.com>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: Re: Finding file revisions
Date: Thu, 28 Apr 2005 18:41:57 +1000 [thread overview]
Message-ID: <20050428084156.GK17682@himi.org> (raw)
In-Reply-To: <200504271831.47830.mason@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 1826 bytes --]
On Wed, Apr 27, 2005 at 06:31:47PM -0400, Chris Mason wrote:
> On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> > On Wed, 27 Apr 2005, Chris Mason wrote:
> > > So, new prog attached. New usage:
> > >
> > > file-changes [-c commit_id] [-s commit_id] file ...
> > >
> > > -c is the commit where you want to start searching
> > > -s is the commit where you want to stop searching
> >
> > Your script will do some funky stuff, because you incorrectly think that
> > the rev-list is sorted linearly. It's not. It's sorted in a rough
> > chronological order, but you really can't do the "last" vs "cur" thing
> > that you do, because two commits after each other in the rev-list listing
> > may well be from two totally different branches, so when you compare one
> > tree against the other, you're really doing something pretty nonsensical.
>
> Aha, didn't realize that one. Thanks, I'll rework things here.
>
I've got a version of this written in C that I've been working on
for a bit - some example output:
+040000 tree bfb75011c32589b282dd9c86621dadb0f0bb3866 ppc
+100644 blob 5ba4fc5259b063dab6417c142938d987ee894fc0 ppc/sha1.c
+100644 blob c3c51aa4d487f2e85c02b0257c1f0b57d6158d76 ppc/sha1.h
+100644 blob e85611a4ef0598f45911357d0d2f1fc354039de4 ppc/sha1ppc.S
commit b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f73d839c4
You run it as:
find-changes commit_id file_prefix ...
The file_prefix is a path prefix to match - it's not as flexible as
regexes, but it shouldn't be too much less useful.
Simon
--
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS)
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/
[-- Attachment #1.2: find-changes.diff --]
[-- Type: text/plain, Size: 8905 bytes --]
Find commits that changed files matching the prefix given on the command line.
Signed-off-by: Simon Fowler <simon@dreamcraft.com.au>
---
Index: Makefile
===================================================================
--- c3aa1e6b53cc59d5fbe261f3f859584904ae3a63/Makefile (mode:100644 sha1:d73bea1cbb9451a89b03d6066bf2ed7fec32fd31)
+++ uncommitted/Makefile (mode:100644)
@@ -38,7 +38,7 @@
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
check-files ls-tree merge-base merge-cache unpack-file git-export \
diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
- diff-tree-helper
+ diff-tree-helper find-changes
SCRIPT= commit-id tree-id parent-id cg-Xdiffdo cg-Xmergefile \
cg-add cg-admin-lsobj cg-cancel cg-clone cg-commit cg-diff \
Index: find-changes.c
===================================================================
--- /dev/null (tree:c3aa1e6b53cc59d5fbe261f3f859584904ae3a63)
+++ uncommitted/find-changes.c (mode:100644 sha1:64c0c3627d84969ee1596b05f97705455fba1871)
@@ -0,0 +1,279 @@
+/*
+ * find-changes.c - find the commits that changed a particular file.
+ */
+
+#include "cache.h"
+//#include "revision.h"
+#include "commit.h"
+#include <sys/param.h>
+
+/*
+ * This is a simple tool that walks through the revisions cache and
+ * checks the parent-child diffs to see if they include the given
+ * filename.
+ */
+
+static int recursive = 1;
+static int found = 0;
+
+static char *malloc_base(const char *base, const char *path, int pathlen)
+{
+ int baselen = strlen(base);
+ char *newbase = malloc(baselen + pathlen + 2);
+ memcpy(newbase, base, baselen);
+ memcpy(newbase + baselen, path, pathlen);
+ memcpy(newbase + baselen + pathlen, "/", 2);
+ return newbase;
+}
+
+static void update_tree_entry(void **bufp, unsigned long *sizep)
+{
+ void *buf = *bufp;
+ unsigned long size = *sizep;
+ int len = strlen(buf) + 1 + 20;
+
+ if (size < len)
+ die("corrupt tree file");
+ *bufp = buf + len;
+ *sizep = size - len;
+}
+
+static const unsigned char *extract(void *tree, unsigned long size, const char **pathp, unsigned int *modep)
+{
+ int len = strlen(tree)+1;
+ const unsigned char *sha1 = tree + len;
+ const char *path = strchr(tree, ' ');
+
+ if (!path || size < len + 20 || sscanf(tree, "%o", modep) != 1)
+ die("corrupt tree file");
+ *pathp = path+1;
+ return sha1;
+}
+
+static int check_file(void *tree, unsigned long size, const char *base, const char *target);
+
+/* A whole sub-tree went away or appeared */
+static int check_tree(void *tree, unsigned long size, const char *base, const char *target)
+{
+ int retval = 0;
+
+ while (size && !retval) {
+ retval = check_file(tree, size, base, target);
+ update_tree_entry(&tree, &size);
+ }
+ return retval;
+}
+
+/* A file entry went away or appeared.
+ * Check the entire subtree under this, and long_jmp() back to the parse_diffs()
+ * function if we find the target. */
+static int check_file(void *tree, unsigned long size, const char *base, const char *target)
+{
+ unsigned mode;
+ const char *path;
+ char full_path[MAXPATHLEN + 1];
+ int pathlen, retval;
+ const unsigned char *sha1 = extract(tree, size, &path, &mode);
+
+ pathlen = snprintf(full_path, MAXPATHLEN, "%s%s", base, path);
+ if (!cache_name_compare(full_path, pathlen, target, strlen(target)))
+ found = 1;
+
+ if (recursive && S_ISDIR(mode)) {
+ char type[20];
+ unsigned long size;
+ char *newbase = malloc_base(base, path, strlen(path));
+ void *tree;
+
+ tree = read_sha1_file(sha1, type, &size);
+ if (!tree || strcmp(type, "tree"))
+ die("corrupt tree sha %s", sha1_to_hex(sha1));
+
+ retval = check_tree(tree, size, newbase, target);
+
+ free(tree);
+ free(newbase);
+ return retval;
+ }
+ return 0;
+}
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, const char *target);
+
+/* the diff-tree algorithm depends on compare_tree_entry returning basically
+ * the same thing that memcmp() would on the filenames - this is important
+ * because the directories are sorted, and hence you need to decide what */
+static int compare_tree_entry(void *tree1, unsigned long size1,
+ void *tree2, unsigned long size2,
+ const char *base, const char *target)
+{
+ unsigned mode1, mode2;
+ const char *path1, *path2;
+ const unsigned char *sha1, *sha2;
+ int cmp, pathlen1, pathlen2;
+
+ if (found)
+ return 0;
+
+ sha1 = extract(tree1, size1, &path1, &mode1);
+ sha2 = extract(tree2, size2, &path2, &mode2);
+
+ pathlen1 = strlen(path1);
+ pathlen2 = strlen(path2);
+ cmp = cache_name_compare(path1, pathlen1, path2, pathlen2);
+ /* these files are different - if this is a directory then the
+ * contents of the subtree are all different. So, we need to
+ * run over the subtree and see if our target is in there
+ * . . . */
+ if (cmp) {
+ check_file(tree1, size1, base, target);
+ check_file(tree2, size2, base, target);
+ return cmp;
+ }
+
+ if (!memcmp(sha1, sha2, 20) && mode1 == mode2)
+ return 0;
+
+ /*
+ * If the filemode has changed to/from a directory from/to a regular
+ * file, we need to consider it a remove and an add.
+ */
+ if (S_ISDIR(mode1) != S_ISDIR(mode2)) {
+ check_file(tree1, size1, base, target);
+ check_file(tree2, size2, base, target);
+ return 0;
+ }
+
+ if (recursive && S_ISDIR(mode1)) {
+ int retval;
+ char *newbase = malloc_base(base, path1, pathlen1);
+ retval = diff_tree_sha1(sha1, sha2, newbase, target);
+ free(newbase);
+ return retval;
+ }
+
+ check_file(tree1, size1, base, target);
+ check_file(tree2, size2, base, target);
+ return 0;
+}
+
+static int diff_tree(void *tree1, unsigned long size1, void *tree2, unsigned long size2,
+ const char *base, const char *target)
+{
+ while (size1 | size2) {
+ if (!size1) {
+ check_file(tree2, size2, base, target);
+ update_tree_entry(&tree2, &size2);
+ continue;
+ }
+ if (!size2) {
+ check_file(tree1, size1, base, target);
+ update_tree_entry(&tree1, &size1);
+ continue;
+ }
+ switch (compare_tree_entry(tree1, size1, tree2, size2, base, target)) {
+ case -1:
+ update_tree_entry(&tree1, &size1);
+ continue;
+ case 0:
+ update_tree_entry(&tree1, &size1);
+ /* Fallthrough */
+ case 1:
+ update_tree_entry(&tree2, &size2);
+ continue;
+ }
+ die("diff-tree: internal error");
+ }
+ return 0;
+}
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base,
+ const char *target)
+{
+ void *tree1, *tree2;
+ unsigned long size1, size2;
+ char type[20];
+ int retval;
+
+ tree1 = read_sha1_file(old, type, &size1);
+ if (!tree1 || strcmp(type, "tree"))
+ die("unable to read source tree %s", sha1_to_hex(old));
+ tree2 = read_sha1_file(new, type, &size2);
+ if (!tree2 || strcmp(type, "tree"))
+ die("unable to read destination tree %s", sha1_to_hex(new));
+ retval = diff_tree(tree1, size1, tree2, size2, base, target);
+ free(tree1);
+ free(tree2);
+ return retval;
+}
+
+static int process_diffs(struct commit *parent, struct commit *commit, const char *target)
+{
+ found = 0;
+ diff_tree_sha1(parent->tree->object.sha1, commit->tree->object.sha1, "", target);
+ if (found)
+ printf("%s\n", sha1_to_hex(commit->object.sha1));
+ return 0;
+}
+
+/*
+ * Walk the set of parents, and collect a list of the objects.
+ */
+void process_commit(struct commit *item)
+{
+ struct commit_list *parents;
+
+ if (parse_commit(item))
+ die("unable to parse commit %s", sha1_to_hex(item->object.sha1));
+
+ parents = item->parents;
+ while (parents) {
+ process_commit(parents->item);
+ parents = parents->next;
+ }
+}
+
+/*
+ * Usage: find-changes <parent-id> <filename>
+ *
+ * Note that this code will find the commits that change the given
+ * file in the set of commits that are parents of the one given on the
+ * command line.
+ */
+
+int main(int argc, char **argv)
+{
+ int i;
+ char sha1[20];
+ struct commit *orig;
+
+ if (argc != 3)
+ usage("find-changes <parent-id> <filename>");
+
+ get_sha1_hex(argv[1], sha1);
+ orig = lookup_commit(sha1);
+ process_commit(orig);
+ mark_reachable(&lookup_commit(argv[1])->object, 1);
+
+ /* this code needs to use tree.c to do most of the work - this
+ * will simplify things a lot.
+ * XXX: rewrite diff-tree.c to do the same. */
+
+ for (i = 0; i < nr_objs; i++) {
+ struct object *obj = objs[i];
+ struct commit *commit;
+ struct commit_list *p;
+
+ if (obj->type != commit_type)
+ continue;
+
+ commit = (struct commit *) obj;
+
+ p = commit->parents;
+ while (p) {
+ process_diffs(p->item, commit, argv[2]);
+ p = p->next;
+ }
+ }
+ return 0;
+}
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2005-04-28 8:51 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-27 16:50 Finding file revisions Chris Mason
2005-04-27 17:34 ` Linus Torvalds
2005-04-27 18:23 ` Chris Mason
2005-04-27 22:19 ` Linus Torvalds
2005-04-27 22:31 ` Chris Mason
2005-04-28 8:41 ` Simon Fowler [this message]
2005-04-28 11:56 ` Chris Mason
2005-04-28 13:13 ` Simon Fowler
2005-04-28 11:45 ` Chris Mason
2005-04-28 16:34 ` Kay Sievers
2005-04-28 17:10 ` Tony Luck
2005-04-28 17:22 ` Thomas Glanzmann
2005-04-28 19:11 ` Kay Sievers
2005-04-28 20:58 ` Chris Mason
2005-04-28 21:32 ` Linus Torvalds
2005-04-28 21:33 ` Kay Sievers
2005-04-28 21:50 ` Linus Torvalds
2005-04-28 22:27 ` Chris Mason
2005-04-28 13:09 ` David Woodhouse
2005-04-28 13:01 ` David Woodhouse
2005-04-27 18:41 ` Thomas Gleixner
2005-04-28 15:24 ` Linus Torvalds
2005-04-28 16:47 ` Thomas Gleixner
2005-04-28 16:08 ` Daniel Barkalow
2005-04-28 17:05 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050428084156.GK17682@himi.org \
--to=simon@hollie.ento.csiro.au \
--cc=git@vger.kernel.org \
--cc=mason@suse.com \
--cc=simon@dreamcraft.com.au \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.