git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Merge-friendly text-based data storage
@ 2012-03-26 14:19 Richard Hartmann
  2012-03-26 18:17 ` Junio C Hamano
  2012-03-27  9:12 ` Holger Hellmuth
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Hartmann @ 2012-03-26 14:19 UTC (permalink / raw)
  To: Git List

Hi all,

I am looking for information on how to design a merge-friendly data
layout. Oddly enough, there does not seem to be much online other than
the obvious "use text-based lines, one per data point".

My current plan looks like:

  metamonger\tversion: 0
  filename\towner_name\tgroup_name\tetc\tpp
  ##########
  file1\trichih\trichih\tfoo\tbar
  relative/path/to/file2\troot\troot\tfoo\tbar

the two upper lines are designed to fail a merge if the version of the
file layout changes. Anything starting with a hash-pound is a comment
and will be ignored.

All other lines are data about random files, relative paths being
allowed, absolute paths and upper paths being forbidden for security
reasons. Values are tab-separated as the format is expressively meant
to be edited by hand. Hex, if needed, would be ASCII-armoured.

As long as there are no lines that start with the same file name, this
file format would allow for efficient merging _if_ git has an internal
concept of line identifiers.


Are there any considerations I missed? Are there any design
guides/best practices to follow?



Thanks,
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merge-friendly text-based data storage
  2012-03-26 14:19 Merge-friendly text-based data storage Richard Hartmann
@ 2012-03-26 18:17 ` Junio C Hamano
  2012-03-26 19:06   ` Richard Hartmann
  2012-03-27  9:12 ` Holger Hellmuth
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2012-03-26 18:17 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: Git List

Richard Hartmann <richih.mailinglist@gmail.com> writes:

> As long as there are no lines that start with the same file name, this
> file format would allow for efficient merging _if_ git has an internal
> concept of line identifiers.

You can write a custom low-level merge driver, and use the attribute
system to mark that your file is meant to be handled by that ll-merge
driver.  There is no need for Git to have "an internal concept of line
identifiers".

It may be of interest to run "git help attributes" and read up on
"Defining a custom merge driver" section.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merge-friendly text-based data storage
  2012-03-26 18:17 ` Junio C Hamano
@ 2012-03-26 19:06   ` Richard Hartmann
  2012-03-26 19:51     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Hartmann @ 2012-03-26 19:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

On Mon, Mar 26, 2012 at 20:17, Junio C Hamano <gitster@pobox.com> wrote:

> It may be of interest to run "git help attributes" and read up on
> "Defining a custom merge driver" section.

Sounds good, thanks.

My file layout looks fine?


-- 
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merge-friendly text-based data storage
  2012-03-26 19:06   ` Richard Hartmann
@ 2012-03-26 19:51     ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2012-03-26 19:51 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: Git List

Richard Hartmann <richih.mailinglist@gmail.com> writes:

> On Mon, Mar 26, 2012 at 20:17, Junio C Hamano <gitster@pobox.com> wrote:
>
>> It may be of interest to run "git help attributes" and read up on
>> "Defining a custom merge driver" section.
>
> Sounds good, thanks.
>
> My file layout looks fine?

I have no opinion on it. It is for the consumers of your datafile (the
ones that read it and find these databasy items in it, and your custom
merge driver) to decide.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merge-friendly text-based data storage
  2012-03-26 14:19 Merge-friendly text-based data storage Richard Hartmann
  2012-03-26 18:17 ` Junio C Hamano
@ 2012-03-27  9:12 ` Holger Hellmuth
  2012-03-27 13:01   ` Richard Hartmann
  1 sibling, 1 reply; 8+ messages in thread
From: Holger Hellmuth @ 2012-03-27  9:12 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: Git List

On 26.03.2012 16:19, Richard Hartmann wrote:
> Hi all,
>
> I am looking for information on how to design a merge-friendly data
> layout. Oddly enough, there does not seem to be much online other than
> the obvious "use text-based lines, one per data point".
>
> My current plan looks like:
>
>    metamonger\tversion: 0
>    filename\towner_name\tgroup_name\tetc\tpp
>    ##########
>    file1\trichih\trichih\tfoo\tbar
>    relative/path/to/file2\troot\troot\tfoo\tbar
>
> the two upper lines are designed to fail a merge if the version of the
> file layout changes. Anything starting with a hash-pound is a comment
> and will be ignored.

I may be misunderstanding something, but lets assume you want to merge a 
file that has "version: 0" with one that has "version: 1" and their last 
common ancestor would have "version: 0" naturally. So the merge would 
not fail even though the file layout changes.

And there would be random merge failures with lines added at the same 
line number even if different.

The normal merging in git isn't suited for this task, it has different 
objectives. Without a custom merge driver as Junio suggested the only 
way would be to store each data line in its own file. As you store file 
paths that would even fit, but I doubt it is what you had in mind

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merge-friendly text-based data storage
  2012-03-27  9:12 ` Holger Hellmuth
@ 2012-03-27 13:01   ` Richard Hartmann
  2012-03-27 15:21     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Hartmann @ 2012-03-27 13:01 UTC (permalink / raw)
  To: Holger Hellmuth; +Cc: Git List

On Tue, Mar 27, 2012 at 11:12, Holger Hellmuth <hellmuth@ira.uka.de> wrote:

> I may be misunderstanding something, but lets assume you want to merge a
> file that has "version: 0" with one that has "version: 1" and their last
> common ancestor would have "version: 0" naturally. So the merge would not
> fail even though the file layout changes.

Ugh, I did not consider that. I can't come up with a way, other than a
custom merge driver, to prevent this. Am I correct?


> And there would be random merge failures with lines added at the same line
> number even if different.

Yes, I know. That was the main reason why I asked for merge-friendly
designs. I briefly considered union merges, but that's not a good idea
for obvious reasons.


> The normal merging in git isn't suited for this task, it has different
> objectives. Without a custom merge driver as Junio suggested

I more or less accepted that I will have to write one, eventually.


> the only way
> would be to store each data line in its own file. As you store file paths
> that would even fit, but I doubt it is what you had in mind

I considered this as well, but that's extremely expensive and wasteful.


-- 
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merge-friendly text-based data storage
  2012-03-27 13:01   ` Richard Hartmann
@ 2012-03-27 15:21     ` Junio C Hamano
  2012-03-27 15:46       ` Holger Hellmuth
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2012-03-27 15:21 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: Holger Hellmuth, Git List

Richard Hartmann <richih.mailinglist@gmail.com> writes:

> On Tue, Mar 27, 2012 at 11:12, Holger Hellmuth <hellmuth@ira.uka.de> wrote:
>
>> I may be misunderstanding something, but lets assume you want to merge a
>> file that has "version: 0" with one that has "version: 1" and their last
>> common ancestor would have "version: 0" naturally. So the merge would not
>> fail even though the file layout changes.
>
> Ugh, I did not consider that. I can't come up with a way, other than a
> custom merge driver, to prevent this. Am I correct?

You are the only judge to that statement: "I can't come up with...".

I can't either, but I know a custom ll-merge driver would work.  It is
designed for this kind of thing.  It will know both version 0 and version
1 format, read from each and writes out the merged result in whatever
format it wants to use.

>> the only way
>> would be to store each data line in its own file. As you store file paths
>> that would even fit, but I doubt it is what you had in mind
>
> I considered this as well, but that's extremely expensive and wasteful.

And it does not solve anything.  The "version" file may cleanly merge to a
new version, and there is no way for the merge result of "version" file to
affect the outcome of merges in other files.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merge-friendly text-based data storage
  2012-03-27 15:21     ` Junio C Hamano
@ 2012-03-27 15:46       ` Holger Hellmuth
  0 siblings, 0 replies; 8+ messages in thread
From: Holger Hellmuth @ 2012-03-27 15:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Richard Hartmann, Git List

On 27.03.2012 17:21, Junio C Hamano wrote:
> Richard Hartmann<richih.mailinglist@gmail.com>  writes:
>
>>> the only way
>>> would be to store each data line in its own file. As you store file paths
>>> that would even fit, but I doubt it is what you had in mind
>>
>> I considered this as well, but that's extremely expensive and wasteful.
>
> And it does not solve anything.  The "version" file may cleanly merge to a
> new version, and there is no way for the merge result of "version" file to
> affect the outcome of merges in other files.

It solves the data merging.

And since a version change is presumably a very scarce event, this could 
be solved with a merge hook that simply aborts the merge with a message 
how to update the older version, then commit and merge.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-03-27 15:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-26 14:19 Merge-friendly text-based data storage Richard Hartmann
2012-03-26 18:17 ` Junio C Hamano
2012-03-26 19:06   ` Richard Hartmann
2012-03-26 19:51     ` Junio C Hamano
2012-03-27  9:12 ` Holger Hellmuth
2012-03-27 13:01   ` Richard Hartmann
2012-03-27 15:21     ` Junio C Hamano
2012-03-27 15:46       ` Holger Hellmuth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).