git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [JGIT RFC] How read versions of a specific object
@ 2009-01-07  3:44 Imran M Yousuf
  2009-01-07  4:04 ` Shawn O. Pearce
  0 siblings, 1 reply; 3+ messages in thread
From: Imran M Yousuf @ 2009-01-07  3:44 UTC (permalink / raw)
  To: Git Mailing List

Hi,

I am trying to read all or n-th version of an object. Currently to do
this I am using the following piece of code, which has to walk to
every commit is present and from there prepare a set of its object id,
it is definitely expensive if the commit history is huge, is there a
faster/better way to achieve it?

for (int i = 0; i < App.OBJECT_COUNT;
            ++i) {
            System.out.println("INDEX: " + i);
            String isbn =
                String.valueOf(Integer.parseInt(App.INIT_ID) + i);
            System.out.println("ISBN: " + isbn);
            ObjectWalk objectWalk = new ObjectWalk(repo);
            /*
             * Checks whether the Commit has the tree or not. It does not
             * check whether it has changed or not.
             */
            objectWalk.setTreeFilter(PathFilter.create(isbn));
            RevObject revObject = null;
            objectWalk.markStart(objectWalk.parseCommit(repo.resolve(
                Constants.HEAD)));
            Set<ObjectId> revisions =
                new HashSet<ObjectId>();
            do {
                if (revObject != null) {
                    Commit revision = repo.mapCommit(revObject.getId());
                    Tree versionTree = repo.mapTree(revision.getTreeId());
                    if (versionTree.existsBlob(isbn)) {
                        revisions.add(versionTree.findBlobMember(isbn).getId());
                    }
                }
                revObject = objectWalk.next();
            }
            while (revObject != null);
            System.out.println("Revisions: " + revisions);
        }

The details source code of the project is available @
http://github.com/imyousuf/jgit-usage/tree/master

Thank you,

-- 
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [JGIT RFC] How read versions of a specific object
  2009-01-07  3:44 [JGIT RFC] How read versions of a specific object Imran M Yousuf
@ 2009-01-07  4:04 ` Shawn O. Pearce
  2009-01-07  9:23   ` Imran M Yousuf
  0 siblings, 1 reply; 3+ messages in thread
From: Shawn O. Pearce @ 2009-01-07  4:04 UTC (permalink / raw)
  To: Imran M Yousuf; +Cc: Git Mailing List

Imran M Yousuf <imyousuf@gmail.com> wrote:
> I am trying to read all or n-th version of an object. Currently to do
> this I am using the following piece of code, which has to walk to
> every commit is present and from there prepare a set of its object id,
> it is definitely expensive if the commit history is huge, is there a
> faster/better way to achieve it?

Not really. You can more efficiently use JGit and reduce some of
the overheads, but that's about it.

> for (int i = 0; i < App.OBJECT_COUNT;
>             ++i) {
>             ObjectWalk objectWalk = new ObjectWalk(repo);

Don't use ObjectWalk, use a RevWalk.  You don't need it to keep
track of tree or blob identities.  The ObjectWalk code has more
overhead to do that bookkeeping.

>                     Commit revision = repo.mapCommit(revObject.getId());
>                     Tree versionTree = repo.mapTree(revision.getTreeId());
>                     if (versionTree.existsBlob(isbn)) {
>                         revisions.add(versionTree.findBlobMember(isbn).getId());

Use a TreeWalk to do this.  Its quicker because it doesn't
have to parse as much data to come up with the same result.

More specifically there's a static factory method that sets up for
a path limited walk and returns the TreeWalk pointing at that entry.

You can use the fact that RevWalk.next() returns a RevCommit to get
you the RevTree, which is the tree you need to give to the TreeWalk
constructor (its the root level tree of the commit).


But if App.OBJECT_COUNT is quite large and covers most of your
objects, you are probably better off using a loop over the commits
and diff'ing against the ancestor:

	final HashMap<String, Set<ObjectId>> versions = ...;
	final RevWalk rw = new RevWalk(repo);
	final TreeWalk tw = new TreeWalk(repo);
	rw.markStart(rw.parseCommit(repo.parse(HEAD)));
	tw.setFilter(TreeFilter.ANY_DIFF);

	RevCommit c;
	while ((c = rw.next()) != null) {
		final ObjectId[] p = new ObjectId[c.getParentCount() + 1];
		for (int i = 0; i < c.getParentCount(); i++) {
			rw.parse(c.getParent(i));
			p[i] = c.getParent(i).getTree();
		}
		final int me = p.length -1;
		p[me] = c.getTree();
		tw.reset(p);
		while (tw.next()) {
			if (tw.getFileMode(me).getObjectType() == Constants.OBJ_BLOB) {
				// This path was modified relative to the ancestor(s).
				//
				String s = tw.getPathString();
				Set<ObjectId> i = versions.get(s);
				if (i == null)
					versions.put(s, i = new HashSet<ObjectId>());
				i.add(tw.getObjectId(me));
			}

			if (tw.isSubtree()) {
				// make sure we recurse into modified directories
				tw.enterSubtree();
			}
		}
	}

-- 
Shawn.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [JGIT RFC] How read versions of a specific object
  2009-01-07  4:04 ` Shawn O. Pearce
@ 2009-01-07  9:23   ` Imran M Yousuf
  0 siblings, 0 replies; 3+ messages in thread
From: Imran M Yousuf @ 2009-01-07  9:23 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Git Mailing List

On Wed, Jan 7, 2009 at 10:04 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Imran M Yousuf <imyousuf@gmail.com> wrote:
>> I am trying to read all or n-th version of an object. Currently to do
>> this I am using the following piece of code, which has to walk to
>> every commit is present and from there prepare a set of its object id,
>> it is definitely expensive if the commit history is huge, is there a
>> faster/better way to achieve it?
>
> Not really. You can more efficiently use JGit and reduce some of
> the overheads, but that's about it.
>

Thanks Shawn, for pointing it out and it actually does improve the
performance, for every lookup its like 200ms.

Best regards,

Imran

>> for (int i = 0; i < App.OBJECT_COUNT;
>>             ++i) {
>>             ObjectWalk objectWalk = new ObjectWalk(repo);
>
> Don't use ObjectWalk, use a RevWalk.  You don't need it to keep
> track of tree or blob identities.  The ObjectWalk code has more
> overhead to do that bookkeeping.
>
>>                     Commit revision = repo.mapCommit(revObject.getId());
>>                     Tree versionTree = repo.mapTree(revision.getTreeId());
>>                     if (versionTree.existsBlob(isbn)) {
>>                         revisions.add(versionTree.findBlobMember(isbn).getId());
>
> Use a TreeWalk to do this.  Its quicker because it doesn't
> have to parse as much data to come up with the same result.
>
> More specifically there's a static factory method that sets up for
> a path limited walk and returns the TreeWalk pointing at that entry.
>
> You can use the fact that RevWalk.next() returns a RevCommit to get
> you the RevTree, which is the tree you need to give to the TreeWalk
> constructor (its the root level tree of the commit).
>
>
> But if App.OBJECT_COUNT is quite large and covers most of your
> objects, you are probably better off using a loop over the commits
> and diff'ing against the ancestor:
>
>        final HashMap<String, Set<ObjectId>> versions = ...;
>        final RevWalk rw = new RevWalk(repo);
>        final TreeWalk tw = new TreeWalk(repo);
>        rw.markStart(rw.parseCommit(repo.parse(HEAD)));
>        tw.setFilter(TreeFilter.ANY_DIFF);
>
>        RevCommit c;
>        while ((c = rw.next()) != null) {
>                final ObjectId[] p = new ObjectId[c.getParentCount() + 1];
>                for (int i = 0; i < c.getParentCount(); i++) {
>                        rw.parse(c.getParent(i));
>                        p[i] = c.getParent(i).getTree();
>                }
>                final int me = p.length -1;
>                p[me] = c.getTree();
>                tw.reset(p);
>                while (tw.next()) {
>                        if (tw.getFileMode(me).getObjectType() == Constants.OBJ_BLOB) {
>                                // This path was modified relative to the ancestor(s).
>                                //
>                                String s = tw.getPathString();
>                                Set<ObjectId> i = versions.get(s);
>                                if (i == null)
>                                        versions.put(s, i = new HashSet<ObjectId>());
>                                i.add(tw.getObjectId(me));
>                        }
>
>                        if (tw.isSubtree()) {
>                                // make sure we recurse into modified directories
>                                tw.enterSubtree();
>                        }
>                }
>        }
>
> --
> Shawn.
>



-- 
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-01-07  9:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-07  3:44 [JGIT RFC] How read versions of a specific object Imran M Yousuf
2009-01-07  4:04 ` Shawn O. Pearce
2009-01-07  9:23   ` Imran M Yousuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).