* [egit / jgit] Implementation of a file tree iteration using ignore rules.
@ 2008-05-09 13:20 Florian Köberle
2008-05-10 0:11 ` Shawn O. Pearce
0 siblings, 1 reply; 4+ messages in thread
From: Florian Köberle @ 2008-05-09 13:20 UTC (permalink / raw)
To: git; +Cc: spearce, robin.rosenberg
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi
I like the idea of a Java implementation of git and would like to
contribute to the jgit/egit project.
In order to get familiar with the code I started to implement a command
like tool which works like git using the jgit library. I implemented
very simple versions of the commands "help", "init" and finally wanted
to implement the "add" command. However, I didn't find any tools to
determine the files which should be added.
So I implemented a factory which returns an Iterable<File> for the
iteration over all the files in a directory.
For an example see the unit test testRealisticExample() in the class
FileIterableFactoryForAddCommandTest:
http://repo.or.cz/w/egit/florian.git?a=blob;f=org.spearce.jgit.test/tst/org/spearce/jgit/lib/fileiteration/FileIterableFactoryForAddCommandTest.java;h=d3c78f4422c708f26ccb56434053bb711fa3116b;hb=669fd814d34e2f989b5f8eedbcb0d5bcf9743ce7
You can view the patches online at:
http://repo.or.cz/w/egit/florian.git?a=shortlog;h=refs/heads/mailinglist-patches-0
I signed all patches and formatted them with the code formatter as I
should. It's ok for me to put the patches under a dual license between a
3-clause BSD and the EPL[*3*]. Currently all files have a GPL 2 notice.
I hope that is ok.
If you want I will send the patches to the mailing list, but I don't
know any automated way to create all the emails. I am not even sure if I
will get them formatted correctly with Thunderbird 2. It would be cool
if you could tell me how to send patches via command line.
I hope you like my patches,
Florian Koeberle
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIJE+I59ca4mzhfxMRAqlwAKCSp57SkqvVsBpdt8o3jL6zNdn0kACfeLnZ
IHErO96fu2rdQcT+JpmroYU=
=E+vF
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [egit / jgit] Implementation of a file tree iteration using ignore rules.
2008-05-09 13:20 [egit / jgit] Implementation of a file tree iteration using ignore rules Florian Köberle
@ 2008-05-10 0:11 ` Shawn O. Pearce
2008-05-10 15:11 ` Florian Köberle
0 siblings, 1 reply; 4+ messages in thread
From: Shawn O. Pearce @ 2008-05-10 0:11 UTC (permalink / raw)
To: Florian Köberle; +Cc: git, robin.rosenberg
Florian Kberle <FloriansKarten@web.de> wrote:
>
> I like the idea of a Java implementation of git and would like to
> contribute to the jgit/egit project.
Even better, we'd love to have you contribute! :-)
> In order to get familiar with the code I started to implement a command
> like tool which works like git using the jgit library. I implemented
> very simple versions of the commands "help", "init"
This is an interesting start. Did you see the existing "Main" class
in org.spearce.jgit/src/org/spearce/jgit/pgm? It sets up and invokes
a TextBuiltin, which is sort of like the "Command" class you added in
your first patch. Though TextBuiltins are created on-the-fly and thus
are harder/impossible to use to format a "jgit help".
I think your approach of building up a table of commands is likely
the better one long term, so I am interested in seeing the two unify,
taking the best from each (from Main and MainProgram that is).
Please note that jgit is restricted to Java 5 APIs only right now.
The "MainProgram" class you introduced uses Arrays.copyOfRange()
which does not compile under Java 5. I guess it is new in Java 6?
> and finally wanted
> to implement the "add" command. However, I didn't find any tools to
> determine the files which should be added.
Right. We haven't implemented this properly yet. So I am very
happy to see someone starting to approach this.
> So I implemented a factory which returns an Iterable<File> for the
> iteration over all the files in a directory.
Sadly this is a reimplementation of the already existing FileTreeIterator,
which is meant to be used within a TreeWalk instance.
The TreeWalk API is meant to iterate over a working directory in
canonical tree entry name ordering, so that we can walk not just
a working directory but also the index file and one or more tree
objects in parallel. We can even walk multiple working directories
at once making directory differencing fairly simple.
What is missing here is really two things:
#1) Take .gitignore and .git/info/exclude (and other patterns) into
account as WorkingTreeIterator (base class of FileTreeIterator)
loops over the entries in a directory.
Since .gitignore can be per-directory we may need to add rules
as we enter into a subtree (createSubtreeIterator method)
and pop rules as we exit a subtree.
Fortunately the pop part is easy if the rules are held within
the WorkingTreeIterator as instance members as pop is already
dealt with up inside of TreeWalk by simply discarding the
subtree instance and returning back to the parent instance.
#2) Be able to edit one (or more) index files during a
TreeWalk. I am (sort of) in the middle of that work on my
egit/spearce.git fork's dircache branch.
We already have a method for filtering entries during TreeWalk; its
the TreeFilter API and its many subclasses. I wonder if the rules
for say .gitignore could simply be implemented through this API and
then allow a TreeFilter to be set directly on the WorkingTreeIterator
to "pre-filter" the entries before they get returned for merging
in the TreeWalk main loop.
I am certain we need to write new subclasses of TreeFilter to handle
fnmatch(3C) style glob rules. But they shouldn't be too difficult.
For example we already have PathFilter to perform equality testing
on path names.
Building onto TreeFilter makes for some more interesting cases,
as we can then feed globs into the revision machinary and actually
do something like `jgit log -- 'path/*.c'`, with the globbing being
done _on_the_fly_ at each tree, and not once up front by the shell.
> For an example see the unit test testRealisticExample() in the class
> FileIterableFactoryForAddCommandTest:
> http://repo.or.cz/w/egit/florian.git?a=blob;f=org.spearce.jgit.test/tst/org/spearce/jgit/lib/fileiteration/FileIterableFactoryForAddCommandTest.java;h=d3c78f4422c708f26ccb56434053bb711fa3116b;hb=669fd814d34e2f989b5f8eedbcb0d5bcf9743ce7
>
> You can view the patches online at:
> http://repo.or.cz/w/egit/florian.git?a=shortlog;h=refs/heads/mailinglist-patches-0
>
> I signed all patches and formatted them with the code formatter as I
> should. It's ok for me to put the patches under a dual license between a
> 3-clause BSD and the EPL[*3*]. Currently all files have a GPL 2 notice.
> I hope that is ok.
Thanks.
I'll be writing a script to edit the headers to switch GPL notice to
EDL (3-clause BSD) notice real soon, and apply it to the bleeding
edge tree. We may need to use it a few times to cover everyone's
topic branches before they merge into the main tree.
> If you want I will send the patches to the mailing list, but I don't
> know any automated way to create all the emails. I am not even sure if I
> will get them formatted correctly with Thunderbird 2. It would be cool
> if you could tell me how to send patches via command line.
I think the command you are looking for is `git send-email`.
--
Shawn.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [egit / jgit] Implementation of a file tree iteration using ignore rules.
2008-05-10 0:11 ` Shawn O. Pearce
@ 2008-05-10 15:11 ` Florian Köberle
2008-05-11 0:12 ` Shawn O. Pearce
0 siblings, 1 reply; 4+ messages in thread
From: Florian Köberle @ 2008-05-10 15:11 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi
| This is an interesting start. Did you see the existing "Main" class
| in org.spearce.jgit/src/org/spearce/jgit/pgm? It sets up and invokes
| a TextBuiltin, which is sort of like the "Command" class you added in
| your first patch. Though TextBuiltins are created on-the-fly and thus
| are harder/impossible to use to format a "jgit help".
I noticed that the class appeared after a rebase, but didn't have a
closer look to it yet.
|
| I think your approach of building up a table of commands is likely
| the better one long term, so I am interested in seeing the two unify,
| taking the best from each (from Main and MainProgram that is).
|
| Please note that jgit is restricted to Java 5 APIs only right now.
| The "MainProgram" class you introduced uses Arrays.copyOfRange()
| which does not compile under Java 5. I guess it is new in Java 6?
Yes it is new in Java 6. A patch fixing this is contained in the patch
set I send to the mailing list.
| What is missing here is really two things:
|
| #1) Take .gitignore and .git/info/exclude (and other patterns) into
| account as WorkingTreeIterator (base class of FileTreeIterator)
| loops over the entries in a directory.
I had a look at the WorkingTreeIterator and it seems to me that it is
possible to reuse my Rules class there.
We could simply give the iterator a member variable of type Rule.
The method loadEntries of WorkingTreeIterator could then use the rules
class to filter out unwanted files and directories.
The constructor WorkingTreeIterator(final WorkingTreeIterator p) could
use the Rules#getRulesForSubDirectory to create a Rules instance from
the parent Rules instance.
Also note that my Rules implementation would ignore the directory a in
the case of "/a\n!/a/b.txt". This means that a directory may not appear
in the list entries, but must be used to create another iterator.
I suggest to put all the classes from the package
org.spearce.jgit.treewalk and the package
org.spearce.jgit.lib.fileiteration into one package. Please tell me
which package and I will send a patch, or do it yourself. I don't have
any outstanding changes.
I don't see a easy way of porting my Rules implementation to the
TreeFilter framework, but as you may noticed it is may not necessary to
do so.
| I think the command you are looking for is `git send-email`.
Thanks, after switching to a newer version of git I was able to use that
command.
Best regards,
Florian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIJbr659ca4mzhfxMRAn/ZAKCYERVJfHgZmvGFXEP+uCT0rD2RawCgqBIW
Xa4NTcAWjt8j0oMPdKMjbbY=
=koEN
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [egit / jgit] Implementation of a file tree iteration using ignore rules.
2008-05-10 15:11 ` Florian Köberle
@ 2008-05-11 0:12 ` Shawn O. Pearce
0 siblings, 0 replies; 4+ messages in thread
From: Shawn O. Pearce @ 2008-05-11 0:12 UTC (permalink / raw)
To: Florian Köberle; +Cc: git
Florian Kberle <FloriansKarten@web.de> wrote:
> | This is an interesting start. Did you see the existing "Main" class
> | in org.spearce.jgit/src/org/spearce/jgit/pgm? It sets up and invokes
> | a TextBuiltin, which is sort of like the "Command" class you added in
> | your first patch. Though TextBuiltins are created on-the-fly and thus
> | are harder/impossible to use to format a "jgit help".
>
> I noticed that the class appeared after a rebase, but didn't have a
> closer look to it yet.
I guess you started working on an older version, only to later find
out that I had also done a lot of work in the mean-time. :-)
My jgit contributions come in huge bursts. RevWalk/TreeWalk was one
back in March; the transport API (almost 10,000 lines of code itself)
is the most recent from late April/early May. Once its fully into
the mainline I'll probably have to slow down for a couple of months.
I have to move in July and have a lot of things to do between now
and then.
> | Please note that jgit is restricted to Java 5 APIs only right now.
> | The "MainProgram" class you introduced uses Arrays.copyOfRange()
> | which does not compile under Java 5. I guess it is new in Java 6?
>
> Yes it is new in Java 6. A patch fixing this is contained in the patch
> set I send to the mailing list.
To keep the history bisectable as much as possible it is better
if you use `git rebase -i` to squash these two changes together,
so that we never introduce the Java 6 usage into the codebase.
> I had a look at the WorkingTreeIterator and it seems to me that it is
> possible to reuse my Rules class there.
>
> We could simply give the iterator a member variable of type Rule.
>
> The method loadEntries of WorkingTreeIterator could then use the rules
> class to filter out unwanted files and directories.
Yea, that sounds right.
> Also note that my Rules implementation would ignore the directory a in
> the case of "/a\n!/a/b.txt". This means that a directory may not appear
> in the list entries, but must be used to create another iterator.
Ouch. I forgot about that fun corner case. In the context of a
TreeWalk directory "a" must actually still be reported as an entry so
that the TreeWalk main loop knows to enter into the subtree iterator.
However the subtree iterator needs to only have entry "b.txt" within
its entry list.
> I suggest to put all the classes from the package
> org.spearce.jgit.treewalk and the package
> org.spearce.jgit.lib.fileiteration into one package. Please tell me
> which package and I will send a patch, or do it yourself. I don't have
> any outstanding changes.
The treewalk package is already established so I would say add
them there. Since you are the original developer and your code is
not yet in mainline I would ask that you perform the renames.
> I don't see a easy way of porting my Rules implementation to the
> TreeFilter framework, but as you may noticed it is may not necessary to
> do so.
The TreeFilter framework is perhaps not the right API for ignore
rules, that is likely true. It also works with paths as byte[] and
not as String, because we get byte[] (generally UTF-8 encoded) data
from canonical tree objects when reading from the object database.
Avoiding the conversion for most entries is a huge performance
improvement for us.
I still won't give up my silly dream for `jgit log -- 'foo/*.c'`,
but maybe we do need two different implementations to make things
work out well.
--
Shawn.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-05-11 0:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-09 13:20 [egit / jgit] Implementation of a file tree iteration using ignore rules Florian Köberle
2008-05-10 0:11 ` Shawn O. Pearce
2008-05-10 15:11 ` Florian Köberle
2008-05-11 0:12 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).