Hacking git for managing machine readable "source" files

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Hacking git for managing machine readable "source" files
@ 2015-10-12  3:32 Christian Gagneraud
  2015-10-12  8:07 ` Michael J Gruber
  0 siblings, 1 reply; 2+ messages in thread
From: Christian Gagneraud @ 2015-10-12  3:32 UTC (permalink / raw)
  To: git

Hi git hackers,

I have been scratching my head since quite a few weeks to see if and how 
I could hack git to manage non-software-source-code files. Theses files 
might be text-based (XML, JSON, custom format, ...) but are not intended 
for humans, thus diffing and merging them using standard git features 
doesn't really make sense (and so the whole "pack" stuff seems useless 
as well). These files represent a non-software project developed using a 
graphical SW application. I'm talking here about designing and 
simulating electronic projects, but it could be apply to any sort of 
engineering (mechanical design comes second to me)

I would like to provide support for diffing, merging, branching and 
forking such electronics projects.

I know, that git is not a conventional SCM software, and as such doesn't 
rely on incremental diff (like CVS, SVN, ...), but...

My graphical software uses a document/command based approach, that is, 
it doesn't directly transform user interaction into graphical changes, 
instead graphical tools generates commands that are then executed on a 
document, which once completed cause the graphical view to update it's 
content.

So far, in my context, a document is simply a tree of objects, the 
lowest commands available are:
- Insert an object in the tree.
- Remove an object from the tree.
- Modify an object property.
All higher level commands are build in term of the above basic commands.

This is, IMHO, an "interesting" feature in the context of traditional 
SCMs. Instead of storing incremental diff, I could store incremental 
commands (I know it would be dead slow, but it would definitely works)

Since git is simply a "content addressable" file system, I can (using 
plumbing commands) create my own system to store my machine-readable 
project: a tree of documents, documents being themselves tree of 
objects. This fit pretty well with git commit, tree and blob objects.

I could even store a serialised command stack (as a tree of command 
objects, again git fits very well here) along with a commit. This would 
represent the set of operations (I call this a document transaction) to 
transform the git document tree associated with the previous commit into 
the git document tree associated with the current commit.

I feel very confident that I could create wrappers around git plumbing 
commands to implement my 3 basics document commands (that would work on 
the index):
mygit insert-object <document> <object-id>
mygit remove-object <document> <object-id>
mygit change-object <document> <object-id> <property-id> <property-value>
Of course, for this to work "mygit" needs to be aware of the low-level 
file format (XML, JSON, ...), but "mygit" doesn't need to know how to 
interpret the whole document.
Storing my document transactions in git would definitely help with 
merging (automatic or manual) and diffing, since document transaction 
would have some extra meta-data that tells what the user really did and 
why it did it, hence giving hints to the algorithm or the end user on 
how to solve a merge conflict for example.

Now, from there, I don't know what would be the best approach for 
diffing and merging, should I completely replace the git pack, diff and 
merge feature? Should I rely on my concept of command and document 
transaction? Maybe I should keep the pack feature and simply implement 
diff and merge using "clever" algorithm? (Just by looking at 2 versions 
of a document, the algorithm is able to detect what was the purpose of 
the change and replay it on top of another document version)

I'm pretty sure I'm not the first person to investigate into this, I 
would be glad if anyone could provide feedback from their own 
experience, advice on how to move next or simply provides criticism or 
points out to literature or existing projects.

Thanks,
Chris

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Hacking git for managing machine readable "source" files
  2015-10-12  3:32 Hacking git for managing machine readable "source" files Christian Gagneraud
@ 2015-10-12  8:07 ` Michael J Gruber
  0 siblings, 0 replies; 2+ messages in thread
From: Michael J Gruber @ 2015-10-12  8:07 UTC (permalink / raw)
  To: Christian Gagneraud, git

Christian Gagneraud venit, vidit, dixit 12.10.2015 05:32:
> Hi git hackers,
> 
> I have been scratching my head since quite a few weeks to see if and how 
> I could hack git to manage non-software-source-code files. Theses files 
> might be text-based (XML, JSON, custom format, ...) but are not intended 
> for humans, thus diffing and merging them using standard git features 
> doesn't really make sense (and so the whole "pack" stuff seems useless 
> as well). These files represent a non-software project developed using a 
> graphical SW application. I'm talking here about designing and 
> simulating electronic projects, but it could be apply to any sort of 
> engineering (mechanical design comes second to me)
> 
> I would like to provide support for diffing, merging, branching and 
> forking such electronics projects.

[wall of text snipped]

I don't think you need to map the tree structure of your project to that
of git's object store, nor am I sure you would benefit from it. (In case
you do want to do it - look at the git-notes implementation.)

There are four handles in git's interface that you can use (and that
have been used):

A) clean/smudge filters: They are meant to transform your working tree
copy into a "standard/canonical form" which is stored in the repo (and
back).

As an example, uncompressing compressed file formats, removing
superfluous comments or time-stamps, sorting in default order (for
unordered files) produces objects in the repo which are a better fit for
packing and possibly also for git's default diff.

B) textconv filters: Possibly lossy filters that produce a human
readable form of an object which possibly also lends itself to a
meaningful git diff (but no way back). Can be cached.

C) external diff drivers: They are supposed to produce a meaningful diff
in cases where textconv+default diff are not enough. They simply receive
both objects to diff.

D) external merge drivers: They are supposed to merge (non-text) files
that git cannot merge.

You'll find pointers in the manual pages for git-diff, git-merge and
gitattributes.

Michael

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-10-12  8:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-12  3:32 Hacking git for managing machine readable "source" files Christian Gagneraud
2015-10-12  8:07 ` Michael J Gruber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).