* Re: Unresolved issues #2
From: Junio C Hamano @ 2006-05-07 9:42 UTC (permalink / raw)
To: git
In-Reply-To: <7vy7xekwbs.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> writes:
> But what about prefersymlinkrefs one? When setting the variable
> to such a non-standard value, it is unreasonable for people to
> want to justify why with a comment like the above.
Obviously I was not reading what I was typing. It is very
reasonable for people to want to do that. Sorry.
^ permalink raw reply
* Re: Unresolved issues #2
From: Junio C Hamano @ 2006-05-07 9:39 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605062332420.6423@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> It was done because the very syntax of the config suggests it be a
> user-editable file. I do not want to mess with the comments more than
> necessary.
I personally feel that is a lost cause _unless_ you come up with
a reasonable convention for where to put comments, stress that
rule to the user in the documentation, _and_ make repo-config to
follow that rule as well.
We _do_ want to treat config file as hand editable and cat
reviewable file, not an unreadable gunk like xml, so trying to
preserve user comments is important and I am not opposed to that
you did (at least some of) it. But as the code currently
stands, what it does is at best half baked, at worst somewhat
confusing.
A demonstration. What is wrong with this picture?
$ cat .git/config
[core]
repositoryformatversion = 0
; are the mode bits trustworthy?
filemode = true ; yes, on ext3
; We want symlinked HEAD because we will bisect
; recent kernel history.
prefersymlinkrefs = true
$ git repo-config core.prefersymlinkrefs false
$ git repo-config core.filemode false
$ cat .git/config
[core]
repositoryformatversion = 0
; are the mode bits trustworthy?
filemode = false
; We want symlinked HEAD because we will bisect
; recent kernel history.
prefersymlinkrefs = false
$ exit
The comment given to "filemode" is "reasonable" in that it
describes what the value that is set to the variable does, and
losing the original comment given to its "true" when we set it
to false is better than keeping it, so that part happens to be
doing the right thing -- only because I knew what repo-config
would do to the comments and arranged original comments in the
file that way.
But what about prefersymlinkrefs one? When setting the variable
to such a non-standard value, it is unreasonable for people to
want to justify why with a comment like the above. But after
resetting the value the comment becomes stale.
It gets worse:
$ git repo-config --unset core.filemode
$ cat .git/config
[core]
repositoryformatversion = 0
; please please use symlinks please
prefersymlinkrefs = false
; are the mode bits trustworthy?
$ exit
There now is a confusing trailing comment left that does not
comment anything. Removing core.filemode is not so common, but
this can happen whenever you remove any variable, so we can use
any other variable as an example.
Now, enough being negative and pointing out problems. Time to
become constructive. Probably a reasonable convention would be
to define the config file format to be something like this:
<comment that applies to the section>
[section]
<comment that applies to the variable stands on
its own before the variable>
variable [= value] <comment that applies to the
fact the variable is set to
this particular value starts
on the same line as the
"variable = value" thing>
- when a variable is reset to another value, remove the
"value comment";
- when a variable disappears, remove "variable comment";
- when a section disappears, remove "section comment";
- otherwise leave the comment intact.
Then we could tell the user the rule is like above, and tell
them to structure the file with comments that way, if they ever
want to edit the file by hand.
Now if we wanted to do something like the above, I suspect that
it would be easier and less error prone to first scan the config
file, note what appears where, and do the processing in-core,
and then write the results out, perhaps using data structures
like these:
struct config_section {
char *pre_comment;
char *name; /* e.g. "core" */
struct config_section *next; /* next section */
struct config_var *vars; /* pointer to the first one */
};
struct config_var {
char *pre_comment;
char *name;
char *value; /* "existence" bool may have NULL,
* otherwise probably a string "= value"
*/
char *value_comment;
struct config_var *next; /* pointer to the next one
* in the section
*/
};
Obviously, data structures like these would make it even easier
if we decide we do _not_ care about comments (we would just lose
x_comment fields, parse the thing and write the resulting list
out).
^ permalink raw reply
* Re: [PATCH] config: if mtime (or size) of the config file changed since last read, reread it
From: Junio C Hamano @ 2006-05-07 8:42 UTC (permalink / raw)
To: Jan-Benedict Glaw; +Cc: git
In-Reply-To: <20060507073052.GC17031@lug-owl.de>
Jan-Benedict Glaw <jbglaw@lug-owl.de> writes:
>> + if (in_fd < 0 && ENOENT != errno )
>
> I admit that I don't like the (constant -operator- variable) notation,
> but mixing both in one line..?
Lol.
I would have written it as (in_fd < 0 && errno != ENOENT) BTW.
^ permalink raw reply
* Re: Unresolved issues #2 (shallow clone again)
From: Sergey Vlasov @ 2006-05-07 8:01 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Junio C Hamano, git
In-Reply-To: <46a038f90605062308x53995076k7bf45f0aebcae0c6@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]
On Sun, 7 May 2006 18:08:03 +1200 Martin Langhoff wrote:
> On 5/6/06, Junio C Hamano <junkio@cox.net> wrote:
> > "Martin Langhoff" <martin.langhoff@gmail.com> writes:
> > >
> > > It means that for a merge or checkout involving stuff we "don't have",
> > > it's trivial to know we are missing, and so we can attempt a fetch of
> > > the missing objects or tell the user how to request them them before
> > > retrying.
> > >
> > > And in any case commits and trees are lightweight and compress well...
> >
> > Commit maybe, but is this based on a hard fact?
>
> No hard facts here :( but I think it's reasonable to assume that the
> trees delta/compress reasonably well, as a given commit will change
> just a few entries in each tree.
>
> I might try and hack a shallow local clone of the kernel and pack it
> tightly to see what it yields.
For linux v2.6.16:
7,3M commits-b41b04a36afebdba3b70b74f419fc7d97249bd7f.pack
24M commits_trees-8397f1c2a885527acd07e2caa8c95df626451493.pack
97M full-c7b2747a674ff55cb4a59dabebe419f191e360df.pack
For comparizon, a single version in packed form:
51M v2.6.12-rc2-4f3526b6815eb63da6c43ed85be1494bb776e2c5.pack
Made with
git-rev-list v2.6.16 | git-pack-objects commits
git-rev-list --objects --no-blobs v2.6.16 | git-pack-objects commits_trees
git-rev-list --objects v2.6.16 | git-pack-objects full
and this hack to git-rev-list:
diff --git a/revision.c b/revision.c
index f2a9f25..b5a929e 100644
--- a/revision.c
+++ b/revision.c
@@ -636,6 +636,10 @@ int setup_revisions(int argc, const char
revs->blob_objects = 1;
continue;
}
+ if (!strcmp(arg, "--no-blobs")) {
+ revs->blob_objects = 0;
+ continue;
+ }
if (!strcmp(arg, "--objects-edge")) {
revs->tag_objects = 1;
revs->tree_objects = 1;
So trees are definitely not lightweight, and commits are rather large
too.
[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]
^ permalink raw reply related
* Re: Unresolved issues #2 (shallow clone again)
From: Jeff King @ 2006-05-07 7:56 UTC (permalink / raw)
To: git
In-Reply-To: <46a038f90605062308x53995076k7bf45f0aebcae0c6@mail.gmail.com>
On Sun, May 07, 2006 at 06:08:03PM +1200, Martin Langhoff wrote:
> >> And in any case commits and trees are lightweight and compress well...
> >Commit maybe, but is this based on a hard fact?
> No hard facts here :( but I think it's reasonable to assume that the
> trees delta/compress reasonably well, as a given commit will change
> just a few entries in each tree.
A few hard facts (using Linus' linux-2.6 tree):
- original packsize: 120996 kilobytes
- unpacked: 233338 objects, 1417476 kilobytes
This is an 11.7:1 compression ratio (of course, much of this is
wasted space from the 4k block size in the filesystem)
- There were 87915 total blob objects, of which 19321 were in the
current tree. I removed all non-current blobs to produce a "shallow"
tree.
- The shallow tree unpacked: 164744 objects, 761960 kilobytes
IOW, about half of the unpacked disk usage was old blobs.
- Shallow commit/tree/tag objects packed (using 1.3.1
git-pack-objects):
Total 164744, written 164744 (delta 92322), reused 0 (delta 0)
size: 108088
The compression ratio here is only 7.0:1
- Total savings by going shallow: 10.7%
So basically, trees and commits DON'T compress as well as historical
blobs (potentially because git-pack-objects isn't currently optimized
for this -- I haven't checked). As a result, we're saving only 10% by
going shallow instead of a potential 50%.
-Peff
^ permalink raw reply
* Re: [PATCH] config: if mtime (or size) of the config file changed since last read, reread it
From: Jan-Benedict Glaw @ 2006-05-07 7:30 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605070144530.7578@wbgn013.biozentrum.uni-wuerzburg.de>
[-- Attachment #1: Type: text/plain, Size: 820 bytes --]
On Sun, 2006-05-07 01:45:22 +0200, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> diff --git a/config.c b/config.c
> index 6765186..452b587 100644
> --- a/config.c
> +++ b/config.c
> @@ -261,6 +261,10 @@ int git_config_from_file(config_fn_t fn,
> config_offset = 0;
>
> in_fd = open(filename, O_RDONLY);
> + if (in_fd < 0 && ENOENT != errno )
I admit that I don't like the (constant -operator- variable) notation,
but mixing both in one line..?
MfG, JBG
--
Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
für einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: Unresolved issues #2 (shallow clone again)
From: Martin Langhoff @ 2006-05-07 6:08 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbqubvdbr.fsf@assigned-by-dhcp.cox.net>
On 5/6/06, Junio C Hamano <junkio@cox.net> wrote:
> "Martin Langhoff" <martin.langhoff@gmail.com> writes:
> >
> > It means that for a merge or checkout involving stuff we "don't have",
> > it's trivial to know we are missing, and so we can attempt a fetch of
> > the missing objects or tell the user how to request them them before
> > retrying.
> >
> > And in any case commits and trees are lightweight and compress well...
>
> Commit maybe, but is this based on a hard fact?
No hard facts here :( but I think it's reasonable to assume that the
trees delta/compress reasonably well, as a given commit will change
just a few entries in each tree.
I might try and hack a shallow local clone of the kernel and pack it
tightly to see what it yields.
cheers,
martin
^ permalink raw reply
* Re: [RFC] get_pathspec(): free() old buffer if rewriting
From: Junio C Hamano @ 2006-05-07 4:50 UTC (permalink / raw)
To: git
In-Reply-To: <7viroipijf.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> writes:
> int match_pathspec(const char *path, int len,
> struct pathspec **pathspec);
>
> See if the path (with length) matches the given spec.
> path[len-1] == '/' signals that the caller is traversing
> a tree and checking if it is worth descending into the
> tree.
> ...
> and when traversing the tree for A and B, upon seeing "Documentation"
> entry in the topleve tree object buffers, call:
>
> path_match("Documentation", 14, 1, spec)
Oops, match_pathspec("Documentation/", 15, spec) is what I meant
here. Likewise in the later examples.
^ permalink raw reply
* Re: [RFC] get_pathspec(): free() old buffer if rewriting
From: Junio C Hamano @ 2006-05-07 4:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605061532190.16343@g5.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> On Sun, 7 May 2006, Johannes Schindelin wrote:
>>
>> This might be the wrong way to do it, but as it is without this patch,
>> get_pathspec() is leaking memory.
>
> I'm not at all convinced we want to do somethng like this.
>
> get_pathspec() is a one-time event. It doesn't "leak" memory. To me,
> "leaking" is when you continually lose a bit of memory, and you eventually
> run out. I don't see that happening here.
I agree with that, except that blame does use it for each
commit and loses memory with deep history.
I have been contemplating to revamp pathspec stuff to deal with
not just an array pointers to strings. tree-diff already uses a
parallel array of ints (pathlens) to optimize away the repeated
use of strlen() for each pathspec elements, and I think we would
be better off using something like that everywhere. The API I
have in mind goes like this:
struct pathspec; /* opaque */
A type opaque to the caller.
struct pathspec **get_pathspec(const char *prefix,
const char **pathspec,
int wildcard);
The caller gives the prefix (return value from
setup_git_directory(), the user supplied pathspec list,
and if it wants to use ls-files style wildcard or
tree-diff style directory prefix bahaviour. A newly
allocated array is returned and the caller can free() it
when done.
int match_pathspec(const char *path, int len,
struct pathspec **pathspec);
See if the path (with length) matches the given spec.
path[len-1] == '/' signals that the caller is traversing
a tree and checking if it is worth descending into the
tree. In that case, original spec string "foo/bar/baz"
matches path = "foo/" (with len = 4). Otherwise the full
path is checked so that original spec string would match
path = "foo/bar/baz" (with len = 11), but not "foo"
(with len = 3). If the get_pathspec() was called with
wildcard support, spec string "foo/bar*" matches these:
"foo/" (i.e. should descend into it),
"foo/barboz/" (ditto)
"foo/bar.txt" (matches)
but not these:
"fob/" (no point descending into it)
"foo/bax" (does not match)
A wildcard aware diff-tree, when invoked like this:
cd Documentation
git-diff-tree -r A B -- 'ho*'
might do:
struct pathspec **spec;
const char *(paths[2]);
paths[0] = "how*"; paths[1] = NULL;
spec = get_pathspec("Documentation/", argv, 1);
and when traversing the tree for A and B, upon seeing "Documentation"
entry in the topleve tree object buffers, call:
path_match("Documentation", 14, 1, spec)
which would return true (worth descending into the directory).
The recursive call would, upon seeing "hooks.txt" entry, call:
path_match("Documentation/hooks.txt", 23, 0, spec)
which would say true and compares the entry from two trees.
Also, the same recursive invocation would call
path_match("Documentation/howto", 19, 1, spec)
which would return true to cause it further recurse into it.
^ permalink raw reply
* Locking a branch
From: Pavel Roskin @ 2006-05-07 2:50 UTC (permalink / raw)
To: git
Hello!
I see there is an active discussion about branch attributes. It would
be nice to have an attribute to prevent git from changing the branch
head in any way. The reason is that it interferes with StGIT on StGIT
managed branches. If StGIT is fine with the change, it would remove or
override the lock temporarily. StGIT could also unlock the branch
permanently if there are no applied patches.
Another use of the branch lock would be to prevent damage to remote
branches, such as "origin". Committing anything to "origin" would break
fetching. In this case, git-fetch should be allowed to override the
lock after checking .git/remotes, but other git commands should respect
the lock.
It should be possible to break locks, but the users would at least think
before doing so.
--
Regards,
Pavel Roskin
^ permalink raw reply
* Re: [RFC] Managing projects - advanced Git tutorial/walkthrough
From: J. Bruce Fields @ 2006-05-07 1:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jakub Narebski, git
In-Reply-To: <7v64kisyow.fsf@assigned-by-dhcp.cox.net>
On Sat, May 06, 2006 at 01:09:03PM -0700, Junio C Hamano wrote:
> - Understanding git as an end user. Currently, this is
> included in the global map git(7) documentation. It might
> make sense to separate it out. This should talk about
> concepts like blobs/trees/commits/trust/index without going
> into lowlevel details of the implementation. The stress
> should be on what they are for, not operationally but
> philosophically. What's currently in README would be
> suitable for this part, with some additional topics:
Yeah, I actually made a start at a sequel to tutorial.txt, with the goal
that after reading the sequel a user would have encountered the main
concepts necessary to read any of the man pages--mainly the object
database and the index file. My work so far is in the
"advanced-tutorial" branch of
git://linux-nfs.org/~bfields/exports/git.git
but it needs some cleaning up.
I was hoping I'd be able to replace some of the README or
core-tutorial.txt in the process, but the latter has a lot of
git-hacker-only detail in it, and the former is a bit more verbose and
has some motivation (explaining why stuff was designed the way it was)
that is nice but maybe not necessary for a minimal tutorial.
--b.
^ permalink raw reply
* Re: [PATCH] fmt-patch: understand old <his> notation
From: Junio C Hamano @ 2006-05-07 1:31 UTC (permalink / raw)
To: git
In-Reply-To: <Pine.LNX.4.64.0605061527030.16343@g5.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> Maybe not. I've actually cursed the fact that I made "git diff X" mean
> "diff from X to current working tree", because it almost never makes any
> sense at all when X is anything but "HEAD".
Actually, I do "git diff next" all the time while on the tip of
my topic branches, and also when merging a topic branch into
"master". This "a different tree with the current working
files" is probably the second most frequently used form for me
(the first one is of course HEAD vs working files).
^ permalink raw reply
* Re: [PATCH] fmt-patch: understand old <his> notation
From: Junio C Hamano @ 2006-05-07 1:28 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605062358280.6357@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> No, it is not too late. I did this patch only to prevent cluttering my
> directory with millions of patches, only because I forgot _again_ to type
> the two dots.
I did that as well, and the thing is, this is _so_ fast that
when I noticed and typed ^C, it already has done 400 or so
commits starting from the epoch (which was empty by the way).
I am already working on adjusting in-tree format-patch users not
to use <his> syntax but to use <his>.. syntax, so this should not
be a problem either way.
>> Also we probably would want to default the diff options to show
>> the root commit diff (rev.show_root_diff).
> I gather this is needed for git-am/git-rebase to continue working?
No, but people want to export the whole history sometimes.
^ permalink raw reply
* Re: Unresolved issues #2
From: Jakub Narebski @ 2006-05-07 0:41 UTC (permalink / raw)
To: git
In-Reply-To: <7vvesirh0q.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano wrote:
> It might make sense to rewrite it to parse and read the existing
> configuration as a whole, do necessary manupulations on the
> parsed internal representation in-core, and write the result out
> from scratch.
Or perhaps do git repo-config read and change config file in two passes:
read and build some kind of index (beginning of section, end of
section/last variable in section, number of elements in section), then in
second pass add some information if needed.
--
Jakub Narebski
Warsaw, Poland
^ permalink raw reply
* Re: [PATCH] config: if mtime (or size) of the config file changed since last read, reread it
From: Johannes Schindelin @ 2006-05-06 23:45 UTC (permalink / raw)
To: git
In-Reply-To: <Pine.LNX.4.63.0605070125010.6597@wbgn013.biozentrum.uni-wuerzburg.de>
Hi,
sorry: bad patch. This is needed on top.
diff --git a/config.c b/config.c
index 6765186..452b587 100644
--- a/config.c
+++ b/config.c
@@ -261,6 +261,10 @@ int git_config_from_file(config_fn_t fn,
config_offset = 0;
in_fd = open(filename, O_RDONLY);
+ if (in_fd < 0 && ENOENT != errno )
+ die("opening %s: %s", config_file_name,
+ strerror(errno));
+
fstat(in_fd, &st);
if (contents) {
@@ -288,9 +292,6 @@ int git_config_from_file(config_fn_t fn,
} else {
contents = NULL;
config_length = 0;
- if (in_fd < 0 && ENOENT != errno )
- die("opening %s: %s", config_file_name,
- strerror(errno));
}
return ret;
^ permalink raw reply related
* [PATCH] config: if mtime (or size) of the config file changed since last read, reread it
From: Johannes Schindelin @ 2006-05-06 23:26 UTC (permalink / raw)
To: git
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
This is extremely paranoic, I know.
config.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/config.c b/config.c
index 05d4d8c..6765186 100644
--- a/config.c
+++ b/config.c
@@ -13,6 +13,7 @@ #define MAXNAME (256)
static const char *contents = NULL;
static int config_length = 0, config_offset = 0;
static const char *config_file_name;
+static time_t config_file_mtime = 0;
static int config_linenr;
static int get_next_char(void)
{
@@ -255,23 +256,26 @@ int git_default_config(const char *var,
int git_config_from_file(config_fn_t fn, const char *filename)
{
int ret, in_fd;
+ struct stat st;
config_offset = 0;
+ in_fd = open(filename, O_RDONLY);
+ fstat(in_fd, &st);
+
if (contents) {
- if (!strcmp(config_file_name, filename))
+ if (!strcmp(config_file_name, filename)
+ && config_file_mtime == st.st_mtime
+ && config_length == st.st_size) {
+ close(in_fd);
return git_parse_file(fn);
+ }
munmap((char*)contents, config_length);
free((char*)config_file_name);
}
- in_fd = open(filename, O_RDONLY);
-
ret = -1;
if (in_fd > 0) {
- struct stat st;
-
- fstat(in_fd, &st);
config_length = st.st_size;
contents = mmap(NULL, config_length, PROT_READ, MAP_PRIVATE,
in_fd, 0);
--
1.3.1.g5545a
^ permalink raw reply related
* Re: [PATCH] fmt-patch: understand old <his> notation
From: Olivier Galibert @ 2006-05-06 23:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Johannes Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605061527030.16343@g5.osdl.org>
On Sat, May 06, 2006 at 03:30:38PM -0700, Linus Torvalds wrote:
> Maybe not. I've actually cursed the fact that I made "git diff X" mean
> "diff from X to current working tree", because it almost never makes any
> sense at all when X is anything but "HEAD".
>
> I probably _should_ have made "git diff X" mean basically "git show X",
> and not what it means now.
Funny, when tracking other projects I use (sometimes path-filtered)
"git diff origin" often, but I find "git show origin" utterly
uninteresting.
OG.
^ permalink raw reply
* [PATCH] config: mmap() the contents of the config file
From: Johannes Schindelin @ 2006-05-06 23:04 UTC (permalink / raw)
To: git, junkio
This makes it possible to rewrite the config without accessing the config
file twice.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
Linus said he'd prefer this.
One thing I am not quite sure about: should we remember the file's
mtime and check it to see if the config file has changed since we
mmap()ed it?
BTW this patch is on top of next, and does not yet contain Sean's
bug fix.
config.c | 102 ++++++++++++++++++++++++++++++++------------------------------
1 files changed, 52 insertions(+), 50 deletions(-)
diff --git a/config.c b/config.c
index 87fb220..05d4d8c 100644
--- a/config.c
+++ b/config.c
@@ -10,31 +10,27 @@ #include <regex.h>
#define MAXNAME (256)
-static FILE *config_file;
+static const char *contents = NULL;
+static int config_length = 0, config_offset = 0;
static const char *config_file_name;
static int config_linenr;
static int get_next_char(void)
{
int c;
- FILE *f;
c = '\n';
- if ((f = config_file) != NULL) {
- c = fgetc(f);
- if (c == '\r') {
+ if (config_offset < config_length) {
+ c = contents[config_offset++];
+ if (c == '\r' && config_offset < config_length) {
/* DOS like systems */
- c = fgetc(f);
+ c = contents[config_offset++];
if (c != '\n') {
- ungetc(c, f);
+ config_offset--;
c = '\r';
}
}
if (c == '\n')
config_linenr++;
- if (c == EOF) {
- config_file = NULL;
- c = '\n';
- }
}
return c;
}
@@ -162,7 +158,7 @@ static int git_parse_file(config_fn_t fn
int c = get_next_char();
if (c == '\n') {
/* EOF? */
- if (!config_file)
+ if (config_offset >= config_length)
return 0;
comment = 0;
continue;
@@ -258,18 +254,41 @@ int git_default_config(const char *var,
int git_config_from_file(config_fn_t fn, const char *filename)
{
- int ret;
- FILE *f = fopen(filename, "r");
+ int ret, in_fd;
+
+ config_offset = 0;
+
+ if (contents) {
+ if (!strcmp(config_file_name, filename))
+ return git_parse_file(fn);
+ munmap((char*)contents, config_length);
+ free((char*)config_file_name);
+ }
+
+ in_fd = open(filename, O_RDONLY);
ret = -1;
- if (f) {
- config_file = f;
- config_file_name = filename;
+ if (in_fd > 0) {
+ struct stat st;
+
+ fstat(in_fd, &st);
+ config_length = st.st_size;
+ contents = mmap(NULL, config_length, PROT_READ, MAP_PRIVATE,
+ in_fd, 0);
+ close(in_fd);
+
+ config_file_name = strdup(filename);
config_linenr = 1;
+ config_offset = 0;
ret = git_parse_file(fn);
- fclose(f);
- config_file_name = NULL;
+ } else {
+ contents = NULL;
+ config_length = 0;
+ if (in_fd < 0 && ENOENT != errno )
+ die("opening %s: %s", config_file_name,
+ strerror(errno));
}
+
return ret;
}
@@ -317,7 +336,7 @@ static int store_aux(const char* key, co
return 1;
}
- store.offset[store.seen] = ftell(config_file);
+ store.offset[store.seen] = config_offset;
store.seen++;
}
break;
@@ -327,12 +346,12 @@ static int store_aux(const char* key, co
break;
} else
/* do not increment matches: this is no match */
- store.offset[store.seen] = ftell(config_file);
+ store.offset[store.seen] = config_offset;
/* fallthru */
case SECTION_END_SEEN:
case START:
if (matches(key, value)) {
- store.offset[store.seen] = ftell(config_file);
+ store.offset[store.seen] = config_offset;
store.state = KEY_SEEN;
store.seen++;
} else if (strrchr(key, '.') - key == store.baselen &&
@@ -367,10 +386,9 @@ static void store_write_pair(int fd, con
write(fd, "\n", 1);
}
-static int find_beginning_of_line(const char* contents, int size,
- int offset_, int* found_bracket)
+static int find_beginning_of_line(int offset_, int* found_bracket)
{
- int equal_offset = size, bracket_offset = size;
+ int equal_offset = config_length, bracket_offset = config_length;
int offset;
for (offset = offset_-2; offset > 0
@@ -420,7 +438,7 @@ int git_config_set_multivar(const char*
const char* value_regex, int multi_replace)
{
int i;
- int fd, in_fd;
+ int fd;
int ret;
char* config_filename = strdup(git_path("config"));
char* lock_file = strdup(git_path("config.lock"));
@@ -471,18 +489,9 @@ int git_config_set_multivar(const char*
/*
* If .git/config does not exist yet, write a minimal version.
*/
- in_fd = open(config_filename, O_RDONLY);
- if ( in_fd < 0 ) {
+ if (!contents) {
free(store.key);
- if ( ENOENT != errno ) {
- error("opening %s: %s", config_filename,
- strerror(errno));
- close(fd);
- unlink(lock_file);
- ret = 3; /* same as "invalid config file" */
- goto out_free;
- }
/* if nothing to unset, error out */
if (value == NULL) {
close(fd);
@@ -494,9 +503,7 @@ int git_config_set_multivar(const char*
store.key = (char*)key;
store_write_section(fd, key);
store_write_pair(fd, key, value);
- } else{
- struct stat st;
- char* contents;
+ } else {
int i, copy_begin, copy_end, new_line = 0;
if (value_regex == NULL)
@@ -555,23 +562,17 @@ int git_config_set_multivar(const char*
goto out_free;
}
- fstat(in_fd, &st);
- contents = mmap(NULL, st.st_size, PROT_READ,
- MAP_PRIVATE, in_fd, 0);
- close(in_fd);
-
if (store.seen == 0)
store.seen = 1;
for (i = 0, copy_begin = 0; i < store.seen; i++) {
if (store.offset[i] == 0) {
- store.offset[i] = copy_end = st.st_size;
+ store.offset[i] = copy_end = config_length;
} else if (store.state != KEY_SEEN) {
copy_end = store.offset[i];
} else
copy_end = find_beginning_of_line(
- contents, st.st_size,
- store.offset[i]-2, &new_line);
+ store.offset[i] - 2, &new_line);
/* write the first part of the config */
if (copy_end > copy_begin) {
@@ -591,11 +592,12 @@ int git_config_set_multivar(const char*
}
/* write the rest of the config */
- if (copy_begin < st.st_size)
+ if (copy_begin < config_length)
write(fd, contents + copy_begin,
- st.st_size - copy_begin);
+ config_length - copy_begin);
- munmap(contents, st.st_size);
+ munmap((char*)contents, config_length);
+ contents = NULL;
unlink(config_filename);
}
--
1.3.1.g5545a
^ permalink raw reply related
* Re: [RFC] get_pathspec(): free() old buffer if rewriting
From: Johannes Schindelin @ 2006-05-06 22:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, junkio
In-Reply-To: <Pine.LNX.4.64.0605061532190.16343@g5.osdl.org>
Hi,
On Sat, 6 May 2006, Linus Torvalds wrote:
> On Sun, 7 May 2006, Johannes Schindelin wrote:
> >
> > This might be the wrong way to do it, but as it is without this patch,
> > get_pathspec() is leaking memory.
>
> I'm not at all convinced we want to do somethng like this.
>
> get_pathspec() is a one-time event. It doesn't "leak" memory. To me,
> "leaking" is when you continually lose a bit of memory, and you eventually
> run out. I don't see that happening here.
I see your point. That was exactly why I put "RFC" and not "PATCH" on the
subject.
> So there's a difference between "don't care" and "leak memory". It sounds
> like you may be using some automated tool that warns because it simply
> doesn't understand that difference.
Nope. No automated tool. Just my brain which wanted to fix _all_
occurrences of that prefix_path() usage bug.
But in case anybody uses Valgrind et al. on git, how about at least a
comment telling the casual observer why we don't free() the buffers?
Ciao,
Dscho
^ permalink raw reply
* Re: [RFC] get_pathspec(): free() old buffer if rewriting
From: Linus Torvalds @ 2006-05-06 22:37 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, junkio
In-Reply-To: <Pine.LNX.4.63.0605070003430.6357@wbgn013.biozentrum.uni-wuerzburg.de>
On Sun, 7 May 2006, Johannes Schindelin wrote:
>
> This might be the wrong way to do it, but as it is without this patch,
> get_pathspec() is leaking memory.
I'm not at all convinced we want to do somethng like this.
get_pathspec() is a one-time event. It doesn't "leak" memory. To me,
"leaking" is when you continually lose a bit of memory, and you eventually
run out. I don't see that happening here.
It's like saying that calling an initialization routine "leaks" memory,
because it doesn't free the memory that contains the function that is now
no longer ever used. It's clearly leaving that memory unused, "dangling".
But the fact is, the OS will free the memory for us when we exit, and
simplicity is often much more powerful than trying to be overly careful.
So there's a difference between "don't care" and "leak memory". It sounds
like you may be using some automated tool that warns because it simply
doesn't understand that difference.
Linus
^ permalink raw reply
* Re: [PATCH] fmt-patch: understand old <his> notation
From: Linus Torvalds @ 2006-05-06 22:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, git
In-Reply-To: <7viroirfur.fsf@assigned-by-dhcp.cox.net>
On Sat, 6 May 2006, Junio C Hamano wrote:
>
> While this would be easier on _my_ fingers as well, I have a
> suspicion that it might make more sense to make this "single
> ref" case to mean "HEAD~5^..HEAD~5" (if we _were_ designing a
> new command that is called format-patch today, that would be
> more natural).
Careful, that "X^..X" thing does entirely the wrong thing for merges. Not
what you want at all, I think.
> But probably it is too late to break it by now.
Maybe not. I've actually cursed the fact that I made "git diff X" mean
"diff from X to current working tree", because it almost never makes any
sense at all when X is anything but "HEAD".
I probably _should_ have made "git diff X" mean basically "git show X",
and not what it means now.
Oh, well.
Linus
^ permalink raw reply
* [RFC] get_pathspec(): free() old buffer if rewriting
From: Johannes Schindelin @ 2006-05-06 22:04 UTC (permalink / raw)
To: git, junkio
This might be the wrong way to do it, but as it is without this patch,
get_pathspec() is leaking memory.
However, it is by no means guaranteed that the input is malloc()ed. The
tests run through without problems, but you never know...
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
setup.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/setup.c b/setup.c
index fe7f884..9c39d6e 100644
--- a/setup.c
+++ b/setup.c
@@ -126,6 +126,11 @@ const char **get_pathspec(const char *pr
prefixlen = prefix ? strlen(prefix) : 0;
do {
*p = prefix_path(prefix, prefixlen, entry);
+ if (*p != entry) {
+ if (*p > entry && *p < entry + strlen(entry))
+ *p = strdup(*p);
+ free((char*)entry);
+ }
} while ((entry = *++p) != NULL);
return (const char **) pathspec;
}
--
1.3.2.g9ba6-dirty
^ permalink raw reply related
* [PATCH] Fix users of prefix_path() to free() only when necessary
From: Johannes Schindelin @ 2006-05-06 22:02 UTC (permalink / raw)
To: git, junkio
Unfortunately, prefix_path() sometimes returns a newly xmalloc()ed buffer,
and in other cases it returns a substring!
For example, when calling
git update-index ./hello.txt
prefix_path() returns "hello.txt", but does not allocate a new buffer. The
original code only checked if the result of prefix_path() was different from
what was passed in, and thusly trigger a segmentation fault.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
checkout-index.c | 4 ++--
update-index.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/checkout-index.c b/checkout-index.c
index 64bdc3b..9876af6 100644
--- a/checkout-index.c
+++ b/checkout-index.c
@@ -278,7 +278,7 @@ int main(int argc, char **argv)
die("git-checkout-index: don't mix '--stdin' and explicit filenames");
p = prefix_path(prefix, prefix_length, arg);
checkout_file(p);
- if (p != arg)
+ if (p < arg || p > arg + strlen(arg))
free((char*)p);
}
@@ -300,7 +300,7 @@ int main(int argc, char **argv)
path_name = buf.buf;
p = prefix_path(prefix, prefix_length, path_name);
checkout_file(p);
- if (p != path_name)
+ if (p < path_name || p > path_name + strlen(path_name))
free((char *)p);
if (path_name != buf.buf)
free(path_name);
diff --git a/update-index.c b/update-index.c
index 7db67aa..f6b09a4 100644
--- a/update-index.c
+++ b/update-index.c
@@ -393,7 +393,7 @@ static void update_one(const char *path,
die("Unable to process file %s", path);
report("add '%s'", path);
free_return:
- if (p != path)
+ if (p < path || p > path + strlen(path))
free((char*)p);
}
@@ -609,7 +609,7 @@ static int do_unresolve(int ac, const ch
const char *arg = av[i];
const char *p = prefix_path(prefix, prefix_length, arg);
err |= unresolve_one(p);
- if (p != arg)
+ if (p < arg || p > arg + strlen(arg))
free((char*)p);
}
return err;
@@ -623,7 +623,7 @@ static int do_reupdate(int ac, const cha
*/
int pos;
int has_head = 1;
- char **pathspec = get_pathspec(prefix, av + 1);
+ const char **pathspec = get_pathspec(prefix, av + 1);
if (read_ref(git_path("HEAD"), head_sha1))
/* If there is no HEAD, that means it is an initial
@@ -815,7 +815,7 @@ int main(int argc, const char **argv)
update_one(p, NULL, 0);
if (set_executable_bit)
chmod_path(set_executable_bit, p);
- if (p != path_name)
+ if (p < path_name || p > path_name + strlen(path_name))
free((char*) p);
if (path_name != buf.buf)
free(path_name);
--
1.3.2.g9ba6-dirty
^ permalink raw reply related
* Re: [PATCH] fmt-patch: understand old <his> notation
From: Johannes Schindelin @ 2006-05-06 22:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7viroirfur.fsf@assigned-by-dhcp.cox.net>
Hi,
On Sat, 6 May 2006, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > When calling "git fmt-patch HEAD~5", you now get the same as if you would
> > have said "git fmt-patch HEAD~5..". This makes it easier for my fingers
> > which are so used to the old syntax.
>
> While this would be easier on _my_ fingers as well, I have a
> suspicion that it might make more sense to make this "single
> ref" case to mean "HEAD~5^..HEAD~5" (if we _were_ designing a
> new command that is called format-patch today, that would be
> more natural). But probably it is too late to break it by now.
No, it is not too late. I did this patch only to prevent cluttering my
directory with millions of patches, only because I forgot _again_ to type
the two dots.
> > I wonder: would it make sense to make add_pending_object() and
> > get_reference() in revision.c non-static?
>
> I'd rather not expose such revision.c internals too much. An
> alternative approach would be to give instruction to revision.c
> (read: another flag like rev.no_walk) to tell it to do something
> special when the user has only one commit, but I think what you
> did in your patch is cleaner and sufficient.
I just stole the function add_head() from builtin-diff.c, but that feels
wrong. I think adding a pending object should not be internal to
revision.c.
> Also we probably would want to default the diff options to show
> the root commit diff (rev.show_root_diff).
I gather this is needed for git-am/git-rebase to continue working?
Ciao,
Dscho
^ permalink raw reply
* Re: [RFC] Managing projects - advanced Git tutorial/walkthrough
From: sean @ 2006-05-06 21:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: jnareb, git
In-Reply-To: <7v64kisyow.fsf@assigned-by-dhcp.cox.net>
On Sat, 06 May 2006 13:09:03 -0700
Junio C Hamano <junkio@cox.net> wrote:
[...]
> I think tutorial.txt is the right "your first experience with
> git with walkthru" document, and the materials it gives, and the
> order in which it introduces them, are very well thought out;
> kudos to JBF. We might also want to show "git grep", but other
> than that I do not think of anything that a new user might want
> to use on the first day.
It doesn't mention git log, show or status which are important
for the first day. Also an example of git commit --amend would
be a nice touch.
Part of the problem people new to git are having arise by reading
documentation in the wrong order and coming to the conclusion that
git is an ugly beast. Also many people are still finding out-of-
date information before anything else (eg. git isn't an scm only an
object tracker).
Part of this will be solved by having a useful and inviting web page
(thanks Pasky!). But it would also help to rename core-tutorial.txt
to something that doesn't sound inviting to newbies. Something
along the line of core-internal-design.txt (or here-be-dragons.txt).
Turning the main man page into more of a reference than a tutorial
slash concepts page would be something worth doing too. Removing
all of the plumbing commands from that page should at least be
considered.
[...]
> Then reorganize the initial part core-tutorial.txt to match the
> examples tutorial.txt gives, and demonstrate what is happening
> under the hood. The tutorial says "git init-db" then "git
> add". The core-tutorial would match that and explain what
> happens when "git init-db" is run (creates .git/objects etc.)
> and "git add" is run (populates the index).
>
There really seems to be a lot of room for another intermediate user
level document between tutorial.txt and the current core-tutorial.txt.
There are lots of concepts that can be explained without having to get
into the low level design or tools of git. For example, a user could
have long been productive with git before ever having to learn about
read-tree/write-tree etc.. all those commands are more for porcelain
writers and git guru/experts now.
Sean
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox