* [RFC] gitweb TODO
@ 2006-11-17 18:01 Jakub Narebski
2006-11-17 19:22 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narebski @ 2006-11-17 18:01 UTC (permalink / raw)
To: git
These are a few gitweb issues and features I'm currently working on
(or plan working on).
1. New patchset view (commitdiff, blobdiff)
In "old" gitweb commitdiff view was generated by iterating over lines of
git-diff-tree raw format output, and generating diffs using
git-cat-file and external diff utility (/usr/bin/diff). This required
having temporary directory for diff generation, and of course diffs
didn't have extended git headers.
The "new" commitdiff view is generated from single git-diff-tree
--raw-with-path output. But I have made incorrect assumption that one
line from "raw" diff-tree output always corresponds to only one patch
in the patchset part of output. This is not the case. I'm not sure if
those are the only cases when patch is broken, but changing file into
symlink or symlink into file ('T' status), and explicit breaking ('B'
status) generates two patches to one line of raw difftree output. The
second is not of much importance for gitweb, unless yoy add -B to
@diff_opts, but the first is important; it is currently broken, see
http://tinyurl.com/y3cfop
(commit 4c52c0d31f0f7142d81a465c40789befc2e86548 on
gitweb-test-funny-char branch in git.git repository).
I have thought of the following (mutually exclusive) ways to fix this
a. Change core git git-diff-tree command to not break (some?) of T
changes into two patches. From what gitster said on #git this
feature is for git-diff patches to be patch(1) compatibile; but
-M already causes patches to be incompatibile with patch. I'm
thinking here about adding some kind of -s/--single-patch
--do-not-break-patch-into-two-please command line option for
git-diff
b. Check the raw difftree line for status and perhaps other info
to know if the line generates more than one patch. It needs detailed
knowledge about _when_ git-diff generates more than one patch to one
"raw" format line, and would break if core diff changes in that
detail. Simplest to implement, I think...
Could you tell me all the cases when git generates more than one
patch for one "raw" diff format line, please?
c. "Cache" git diff header, or the whole patch, or the whole patchset.
It is enough to cache (write lines to "buffer"/"cache" array) up to
the extended header "^index" line, which can be used to check if to
go to the next dofftree "raw" line (or wven which of "raw" difftree
lines this particular patch corresponds to). Does not require
changes in diff core, and is less fragile, less susceptible to
breakage.
Which of those would be the best to implement?
2. Difftree combined diff gitweb "raw" format
Currently "commitdiff" view consist of the gitweb representation of
"raw" git-diff output (list of changed files a la git-whatchanged), and
the patchset (a la git-show). "commit" view has only list of changed
files, nearly exacttly the same as in "commitdiff" view (but with links
to blobdiff view instead of links to appropriate patch in "commitdiff"
view).
I have though about using one of the combined diff outputs for merge
commits. The problem is how to represent the whatchanged part. Which
parts of gitweb difftree output to leave? And what about the fact that
we have raw output for -c/--combined diff format, but not for chunk
simplifying --cc (compact combined) output?
3. Committags support (and implementation)
There was some proposed implementation here, both by me and by Junio,
but no definite patches were accepted. We have the following mutually
exclusive ways to do this:
a. Do esc_html first, then do all committags simultaneously. The
advantage is that it is perhaps slightly faster, the disadvantages
is additional complication in code, and the fact that regexp
defining committags have to be either on esc_html-ed pattern, or be
converted to esc_html-ed input.
b. Have a chain of committags, and do committags sequentially. This
means that we have to divide output into part to be further parsed,
and the part which should not be parsed further; here Junio proposed
wondefull idea of list mixed of strings (to be committags parsed)
and string references (to be left as is). As the last "committag"
we perform esc_html (uless for example some project stores commit
messages in HTML, XML, or some structured text like AsciiDoc or
reStructuredText).
Which of those would be better to implement?
4. Feeds (RSS, Atom,...)
There is new Atom format feed, there was request for per-branch feeds.
The feed output certainly needs cleanup. The questions are:
a.) What should be in the feed (only commit message + authorship, or
also whatchanged; what format use for whatchanged; do add information
about tags created with given window of time, etc.)?
b.) Should we use "text/xml" or (unstandarized) "application/rss+xml" as
type/Content-Type for RSS 2.01 format, should we use
"application/atom+xml" for Atom format?
c.) Should we add APP (Atom Publishing Protocol) support in addition to
OPML?
d.) Should we put log, shortlog, history, rss, atom views together in
one subroutine, and select the view using $format parameter of this
subroutine?
--
Jakub Narebski
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] gitweb TODO
2006-11-17 18:01 [RFC] gitweb TODO Jakub Narebski
@ 2006-11-17 19:22 ` Junio C Hamano
2006-11-17 20:30 ` Jakub Narebski
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-11-17 19:22 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> These are a few gitweb issues and features I'm currently working on
> (or plan working on).
>
> 1. New patchset view (commitdiff, blobdiff)
>
> c. "Cache" git diff header, or the whole patch, or the whole patchset.
> It is enough to cache (write lines to "buffer"/"cache" array) up to
> the extended header "^index" line, which can be used to check if to
> go to the next dofftree "raw" line (or wven which of "raw" difftree
> lines this particular patch corresponds to). Does not require
> changes in diff core, and is less fragile, less susceptible to
> breakage.
This is the most appropriate. Right now it is not independently
controllable but it is not so inplausible for somebody to want
to get non recursive view of 'raw' part along with textual diffs
when running "--raw -p" diff and your solution c. is robust
against even such changes.
I would probably not call that "caching" but keeping track of
where you are, and you would need to know in which filepair you
are in anyway when you want to implement links to blob view from
the hunk header lines.
> 2. Difftree combined diff gitweb "raw" format
>...
> I have though about using one of the combined diff outputs for merge
> commits. The problem is how to represent the whatchanged part. Which
> parts of gitweb difftree output to leave? And what about the fact that
> we have raw output for -c/--combined diff format, but not for chunk
> simplifying --cc (compact combined) output?
I am not sure what's more useful to show there. The part of the
output shows "the list of files that have changed by this
commit" for non-merge commits, so "the list of files that have
changed in this merge" is what you would want to show, but there
are three ways you can define "what changed" for a merge (see
the message that explains --cc to linux@horizontal I sent
yesterday). I find "diff --name-status $it^1 $it" the most
natural semantics for the most of the time, and that is what we
do for --stat output.
If you want to treat all the parents equally, you could read
from "diff-tree -c --raw $it" which would show list of files
that needed file-level merges, along with the blob object names
from all parents.
We might want to give that when "diff-tree --cc --patch-with-raw
$it" is asked for in the "raw" part. I dunno.
> 3. Committags support (and implementation)
>...
> Which of those would be better to implement?
I think that is a minor implementation detail; I think the
latter is probably less painful to maintain in the longer run
because if you encode things in earlier stages, postprocessing
stages need to know which part of the result from the earlier
processing needs decoding before further processing, but I
suspect you will be the one primarily hacking on that part, so
it is not my preference, but just a suggestion to make your life
less painful.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] gitweb TODO
2006-11-17 19:22 ` Junio C Hamano
@ 2006-11-17 20:30 ` Jakub Narebski
2006-11-17 21:08 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narebski @ 2006-11-17 20:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> These are a few gitweb issues and features I'm currently working on
>> (or plan working on).
>>
>> 1. New patchset view (commitdiff, blobdiff)
>>
>> c. "Cache" git diff header, or the whole patch, or the whole patchset.
>> It is enough to cache (write lines to "buffer"/"cache" array) up to
>> the extended header "^index" line, which can be used to check if to
>> go to the next dofftree "raw" line (or wven which of "raw" difftree
>> lines this particular patch corresponds to). Does not require
>> changes in diff core, and is less fragile, less susceptible to
>> breakage.
>
> This is the most appropriate. Right now it is not independently
> controllable but it is not so inplausible for somebody to want
> to get non recursive view of 'raw' part along with textual diffs
> when running "--raw -p" diff and your solution c. is robust
> against even such changes.
What about the fact that git-diff -M is _not_ patch-compatibile;
does creating two patches for one difftree raw format line for
mode change/'T' status (I guess only for "type" mode changes, i.e.
file to/from symlink, file to/from directory) helps understanding
the change? If not, perhaps it would be better to introduce option
to not break the patch...
> I would probably not call that "caching" but keeping track of
> where you are, and you would need to know in which filepair you
> are in anyway when you want to implement links to blob view from
> the hunk header lines.
I'd say "buffering" rather than "caching". The problem is that
you have to read up to the "index <hash>..<hash>[ <mode>]" to
check if you have to go to the next raw difftree line or not.
And the info from difftree line is needed to hyperlink parts
of extended git diff header (and also to hyperlink parts of
patch).
>> 2. Difftree combined diff gitweb "raw" format
>>...
>> I have though about using one of the combined diff outputs for merge
>> commits. The problem is how to represent the whatchanged part. Which
>> parts of gitweb difftree output to leave? And what about the fact that
>> we have raw output for -c/--combined diff format, but not for chunk
>> simplifying --cc (compact combined) output?
>
> I am not sure what's more useful to show there. The part of the
> output shows "the list of files that have changed by this
> commit" for non-merge commits, so "the list of files that have
> changed in this merge" is what you would want to show, but there
> are three ways you can define "what changed" for a merge (see
> the message that explains --cc to linux@horizontal I sent
> yesterday). I find "diff --name-status $it^1 $it" the most
> natural semantics for the most of the time, and that is what we
> do for --stat output.
>
> If you want to treat all the parents equally, you could read
> from "diff-tree -c --raw $it" which would show list of files
> that needed file-level merges, along with the blob object names
> from all parents.
We should have whatchanged part corresponding to the patchset
part at least in "commitdiff" view, which means '-c' (and for
the time being perhaps mean '-c' also in patchset part). '--cc'
which uses '-c' for the raw part would be nice...
"Commit" view could use "diff --name-status $hash^1 $hash",
(i.e. I think the same what it does now), even if it is not
compatibile with "commitdiff" view.
--
Jakub Narebski
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] gitweb TODO
2006-11-17 20:30 ` Jakub Narebski
@ 2006-11-17 21:08 ` Junio C Hamano
2006-11-17 21:24 ` Jakub Narebski
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-11-17 21:08 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> What about the fact that git-diff -M is _not_ patch-compatibile;
What about it? I've never said patch compatibility is an issue.
We have something patch cannot represent or understand and you
should admit it. The point is to make it easier to massage by
hand, when the recipient does not have git handy.
With -M, the recipient can read and understand the patch text
better than "remove this oldfile and create this newfile that
the diff output does not tell you is related" diff. And we say
"rename" in plain language so the recipient _can_ do "mv A B"
then "patch -p1". Similarly, with -T that changes a symlink
into a real file, if we do not do the current "remove the old
and then create the new" and did instead "show the textual diff
that can be applied", a non-git tool that does not understand
the typechange can mistakenly muck with the target of the
symlink, which is a disaster. "Remove the target and then
create this" at least would have lesser damage -- the object
left as the result is incorrect nevertheless, but reading the
contents and creating a symlink that has that contents by hand
is easily done in a pinch.
> We should have whatchanged part corresponding to the patchset
> part at least in "commitdiff" view, which means '-c' (and for
> the time being perhaps mean '-c' also in patchset part). '--cc'
> which uses '-c' for the raw part would be nice...
I am not sure what you mean by patchset part, but if you are
talking about the multiway diff text, I think most of the time
output from "-c -p" is much less interesting than "--cc".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] gitweb TODO
2006-11-17 21:08 ` Junio C Hamano
@ 2006-11-17 21:24 ` Jakub Narebski
2006-11-18 0:04 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narebski @ 2006-11-17 21:24 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> What about the fact that git-diff -M is _not_ patch-compatibile;
>
> What about it? I've never said patch compatibility is an issue.
> We have something patch cannot represent or understand and you
> should admit it. The point is to make it easier to massage by
> hand, when the recipient does not have git handy.
>
> With -M, the recipient can read and understand the patch text
> better than "remove this oldfile and create this newfile that
> the diff output does not tell you is related" diff. And we say
> "rename" in plain language so the recipient _can_ do "mv A B"
> then "patch -p1". Similarly, with -T that changes a symlink
> into a real file, if we do not do the current "remove the old
> and then create the new" and did instead "show the textual diff
> that can be applied", a non-git tool that does not understand
> the typechange can mistakenly muck with the target of the
> symlink, which is a disaster. "Remove the target and then
> create this" at least would have lesser damage -- the object
> left as the result is incorrect nevertheless, but reading the
> contents and creating a symlink that has that contents by hand
> is easily done in a pinch.
So we split patch for "type change" mode change for patch -p1
safety. But for gitweb more important than producing safe patch
is producing _readable_ patch[*1*]. Hence request for -T option
(or under other, better name) to git-diff which would _not_
split patch (not create two patches for one raw difftree line).
[*1*] Well, as far as diff for symlink is readable anyway.
>> We should have whatchanged part corresponding to the patchset
>> part at least in "commitdiff" view, which means '-c' (and for
>> the time being perhaps mean '-c' also in patchset part). '--cc'
>> which uses '-c' for the raw part would be nice...
>
> I am not sure what you mean by patchset part, but if you are
> talking about the multiway diff text, I think most of the time
> output from "-c -p" is much less interesting than "--cc".
Sorry, perhaps I was not clear. I'd like for git-diff-tree --cc
--raw -p to output both raw part (perhaps taken from -c) and
patch part[*2*]. Gitweb needs raw part for both whatchanged
and also for creating hyperlinks and gitweb quoted filenames
in the patch part (although with git diff headers buffering
and git diff extended headers parsing we could get the information
needed for hyperlinks and gitweb quoting of filenames for patch
part from patch part).
[*2*] I'm wondering why we don't have --patch long version of -p
option to git-diff[-tree].
--
Jakub Narebski
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] gitweb TODO
2006-11-17 21:24 ` Jakub Narebski
@ 2006-11-18 0:04 ` Junio C Hamano
0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2006-11-18 0:04 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> ... But for gitweb more important than producing safe patch
> is producing _readable_ patch[*1*]. Hence request for -T option
> (or under other, better name) to git-diff which would _not_
> split patch (not create two patches for one raw difftree line).
>
> [*1*] Well, as far as diff for symlink is readable anyway.
Honestly, I do not have strong feeling either way. As you say,
I suspect diff to change between symlink and regular file is not
readable no matter how you present it, and it is a corner case
that is not very interesting. It happens in real life but it is
rare enough that split patches or a single patch would not make
much difference either way.
So I would not oppose to a patch to add an option to update
git-diff to produce either format, but I doubt it is worth the
effort required to make sure that the change does not break
anything else and also to make matching adjustment to git-apply,
so that it can understand both formats.
>> I am not sure what you mean by patchset part, but if you are
>> talking about the multiway diff text, I think most of the time
>> output from "-c -p" is much less interesting than "--cc".
>
> Sorry, perhaps I was not clear. I'd like for git-diff-tree --cc
> --raw -p to output both raw part (perhaps taken from -c) and
> patch part...
I can see why somebody might want to do this, to exactly the
same degree that I can see why somebody might want to use the
combination of "--raw -p".
I think this would work in git.git itself:
git diff-tree --cc --numstat --raw -p v1.0.0
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-11-18 0:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-17 18:01 [RFC] gitweb TODO Jakub Narebski
2006-11-17 19:22 ` Junio C Hamano
2006-11-17 20:30 ` Jakub Narebski
2006-11-17 21:08 ` Junio C Hamano
2006-11-17 21:24 ` Jakub Narebski
2006-11-18 0:04 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.