* [JGIT RFC] How read versions of a specific object
@ 2009-01-07 3:44 Imran M Yousuf
2009-01-07 4:04 ` Shawn O. Pearce
0 siblings, 1 reply; 3+ messages in thread
From: Imran M Yousuf @ 2009-01-07 3:44 UTC (permalink / raw)
To: Git Mailing List
Hi,
I am trying to read all or n-th version of an object. Currently to do
this I am using the following piece of code, which has to walk to
every commit is present and from there prepare a set of its object id,
it is definitely expensive if the commit history is huge, is there a
faster/better way to achieve it?
for (int i = 0; i < App.OBJECT_COUNT;
++i) {
System.out.println("INDEX: " + i);
String isbn =
String.valueOf(Integer.parseInt(App.INIT_ID) + i);
System.out.println("ISBN: " + isbn);
ObjectWalk objectWalk = new ObjectWalk(repo);
/*
* Checks whether the Commit has the tree or not. It does not
* check whether it has changed or not.
*/
objectWalk.setTreeFilter(PathFilter.create(isbn));
RevObject revObject = null;
objectWalk.markStart(objectWalk.parseCommit(repo.resolve(
Constants.HEAD)));
Set<ObjectId> revisions =
new HashSet<ObjectId>();
do {
if (revObject != null) {
Commit revision = repo.mapCommit(revObject.getId());
Tree versionTree = repo.mapTree(revision.getTreeId());
if (versionTree.existsBlob(isbn)) {
revisions.add(versionTree.findBlobMember(isbn).getId());
}
}
revObject = objectWalk.next();
}
while (revObject != null);
System.out.println("Revisions: " + revisions);
}
The details source code of the project is available @
http://github.com/imyousuf/jgit-usage/tree/master
Thank you,
--
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [JGIT RFC] How read versions of a specific object
2009-01-07 3:44 [JGIT RFC] How read versions of a specific object Imran M Yousuf
@ 2009-01-07 4:04 ` Shawn O. Pearce
2009-01-07 9:23 ` Imran M Yousuf
0 siblings, 1 reply; 3+ messages in thread
From: Shawn O. Pearce @ 2009-01-07 4:04 UTC (permalink / raw)
To: Imran M Yousuf; +Cc: Git Mailing List
Imran M Yousuf <imyousuf@gmail.com> wrote:
> I am trying to read all or n-th version of an object. Currently to do
> this I am using the following piece of code, which has to walk to
> every commit is present and from there prepare a set of its object id,
> it is definitely expensive if the commit history is huge, is there a
> faster/better way to achieve it?
Not really. You can more efficiently use JGit and reduce some of
the overheads, but that's about it.
> for (int i = 0; i < App.OBJECT_COUNT;
> ++i) {
> ObjectWalk objectWalk = new ObjectWalk(repo);
Don't use ObjectWalk, use a RevWalk. You don't need it to keep
track of tree or blob identities. The ObjectWalk code has more
overhead to do that bookkeeping.
> Commit revision = repo.mapCommit(revObject.getId());
> Tree versionTree = repo.mapTree(revision.getTreeId());
> if (versionTree.existsBlob(isbn)) {
> revisions.add(versionTree.findBlobMember(isbn).getId());
Use a TreeWalk to do this. Its quicker because it doesn't
have to parse as much data to come up with the same result.
More specifically there's a static factory method that sets up for
a path limited walk and returns the TreeWalk pointing at that entry.
You can use the fact that RevWalk.next() returns a RevCommit to get
you the RevTree, which is the tree you need to give to the TreeWalk
constructor (its the root level tree of the commit).
But if App.OBJECT_COUNT is quite large and covers most of your
objects, you are probably better off using a loop over the commits
and diff'ing against the ancestor:
final HashMap<String, Set<ObjectId>> versions = ...;
final RevWalk rw = new RevWalk(repo);
final TreeWalk tw = new TreeWalk(repo);
rw.markStart(rw.parseCommit(repo.parse(HEAD)));
tw.setFilter(TreeFilter.ANY_DIFF);
RevCommit c;
while ((c = rw.next()) != null) {
final ObjectId[] p = new ObjectId[c.getParentCount() + 1];
for (int i = 0; i < c.getParentCount(); i++) {
rw.parse(c.getParent(i));
p[i] = c.getParent(i).getTree();
}
final int me = p.length -1;
p[me] = c.getTree();
tw.reset(p);
while (tw.next()) {
if (tw.getFileMode(me).getObjectType() == Constants.OBJ_BLOB) {
// This path was modified relative to the ancestor(s).
//
String s = tw.getPathString();
Set<ObjectId> i = versions.get(s);
if (i == null)
versions.put(s, i = new HashSet<ObjectId>());
i.add(tw.getObjectId(me));
}
if (tw.isSubtree()) {
// make sure we recurse into modified directories
tw.enterSubtree();
}
}
}
--
Shawn.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [JGIT RFC] How read versions of a specific object
2009-01-07 4:04 ` Shawn O. Pearce
@ 2009-01-07 9:23 ` Imran M Yousuf
0 siblings, 0 replies; 3+ messages in thread
From: Imran M Yousuf @ 2009-01-07 9:23 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Git Mailing List
On Wed, Jan 7, 2009 at 10:04 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Imran M Yousuf <imyousuf@gmail.com> wrote:
>> I am trying to read all or n-th version of an object. Currently to do
>> this I am using the following piece of code, which has to walk to
>> every commit is present and from there prepare a set of its object id,
>> it is definitely expensive if the commit history is huge, is there a
>> faster/better way to achieve it?
>
> Not really. You can more efficiently use JGit and reduce some of
> the overheads, but that's about it.
>
Thanks Shawn, for pointing it out and it actually does improve the
performance, for every lookup its like 200ms.
Best regards,
Imran
>> for (int i = 0; i < App.OBJECT_COUNT;
>> ++i) {
>> ObjectWalk objectWalk = new ObjectWalk(repo);
>
> Don't use ObjectWalk, use a RevWalk. You don't need it to keep
> track of tree or blob identities. The ObjectWalk code has more
> overhead to do that bookkeeping.
>
>> Commit revision = repo.mapCommit(revObject.getId());
>> Tree versionTree = repo.mapTree(revision.getTreeId());
>> if (versionTree.existsBlob(isbn)) {
>> revisions.add(versionTree.findBlobMember(isbn).getId());
>
> Use a TreeWalk to do this. Its quicker because it doesn't
> have to parse as much data to come up with the same result.
>
> More specifically there's a static factory method that sets up for
> a path limited walk and returns the TreeWalk pointing at that entry.
>
> You can use the fact that RevWalk.next() returns a RevCommit to get
> you the RevTree, which is the tree you need to give to the TreeWalk
> constructor (its the root level tree of the commit).
>
>
> But if App.OBJECT_COUNT is quite large and covers most of your
> objects, you are probably better off using a loop over the commits
> and diff'ing against the ancestor:
>
> final HashMap<String, Set<ObjectId>> versions = ...;
> final RevWalk rw = new RevWalk(repo);
> final TreeWalk tw = new TreeWalk(repo);
> rw.markStart(rw.parseCommit(repo.parse(HEAD)));
> tw.setFilter(TreeFilter.ANY_DIFF);
>
> RevCommit c;
> while ((c = rw.next()) != null) {
> final ObjectId[] p = new ObjectId[c.getParentCount() + 1];
> for (int i = 0; i < c.getParentCount(); i++) {
> rw.parse(c.getParent(i));
> p[i] = c.getParent(i).getTree();
> }
> final int me = p.length -1;
> p[me] = c.getTree();
> tw.reset(p);
> while (tw.next()) {
> if (tw.getFileMode(me).getObjectType() == Constants.OBJ_BLOB) {
> // This path was modified relative to the ancestor(s).
> //
> String s = tw.getPathString();
> Set<ObjectId> i = versions.get(s);
> if (i == null)
> versions.put(s, i = new HashSet<ObjectId>());
> i.add(tw.getObjectId(me));
> }
>
> if (tw.isSubtree()) {
> // make sure we recurse into modified directories
> tw.enterSubtree();
> }
> }
> }
>
> --
> Shawn.
>
--
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-01-07 9:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-07 3:44 [JGIT RFC] How read versions of a specific object Imran M Yousuf
2009-01-07 4:04 ` Shawn O. Pearce
2009-01-07 9:23 ` Imran M Yousuf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).