Git development

Git development
 help / color / mirror / Atom feed

* Re: "warning: no common commits" triggered due to change of remote's IP address?
From: Thomas Rast @ 2009-03-02  8:40 UTC (permalink / raw)
  To: Brent Goodrick; +Cc: git
In-Reply-To: <e38bce640903011501t2c7a134dp887f5a96db3db0f4@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1935 bytes --]

Brent Goodrick wrote:
> On Sun, Mar 1, 2009 at 1:20 PM, Thomas Rast <trast@student.ethz.ch> wrote:
> > [...] have you rewritten the repo hosting 'home' between
> > two fetches?  Using (especially, but not only) git-filter-branch can
> > easily render your history disjoint from the pre-filtering state.
> 
> Hmmm, maybe, without knowing it.

It's rather hard to rewrite history without knowing it, there are big
warnings all over the relevant tools' manpages...

> Originally, that section of the
> .git/config file had "*"'s where "home" was. To clarify, the original
> was:
> 
> [remote "origin"]
> 	url = <some_ip_address>:git.repos/environ.git
> 	fetch = +refs/heads/*:refs/remotes/origin/*
> 
> and the current one is now:
> 
> [remote "origin"]
> 	url = <some_ip_address>:git.repos/environ.git
> 	fetch = +refs/heads/home:refs/remotes/origin/home
> 
> Maybe I had made that change and this is the first time I am doing a
> fetch to using that change. I thinking that was the cause of this,
> because I retried doing a fetch into a separate throw-away repo with
> just the change of IP address, and it did not need to fetch anything
> more. I had not executed git-filter-branch at all.

Ironically I cannot reproduce this except with my "own" version that
includes the patch I posted yesterday.  I'll have to look into why it
fails to list any refs to the remote.  In the meantime please
disregard that patch.

If you still have a repo that can reproduce the problem, please keep a
copy for future investigation, and then try

  git fetch-pack -v $url refs/remotes/origin/home 2>&1 \
  | git name-rev --stdin

The -v will dump a lot of output about the common commit search.  The
message "giving up" indicates that you hit the 256 commit limit; if
that doesn't appear, please include the full output so that we can see
where it stops.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: "warning: no common commits" triggered due to change of remote's IP address?
From: Thomas Rast @ 2009-03-02  8:56 UTC (permalink / raw)
  To: Brent Goodrick; +Cc: git
In-Reply-To: <200903020940.24813.trast@student.ethz.ch>

[-- Attachment #1: Type: text/plain, Size: 434 bytes --]

Thomas Rast wrote:
> If you still have a repo that can reproduce the problem, please keep a
> copy for future investigation, and then try
> 
>   git fetch-pack -v $url refs/remotes/origin/home 2>&1 \
>   | git name-rev --stdin

Actually this should name the remote's idea of the ref, i.e.,

  git fetch-pack -v $url refs/heads/home 2>&1 \
  | git name-rev --stdin

Sorry.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH v2] send-email: add --confirm option
From: Junio C Hamano @ 2009-03-02  9:01 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: Jay Soffian, Paul Gortmaker, git
In-Reply-To: <20090302172401.6117@nanako3.lavabit.com>

Nanako Shiraishi <nanako3@lavabit.com> writes:

> The escape hatch was there from the beginning, is still there, and it
> will remain there. I should also add that it was Junio's veto of
> Linus'es proposal to stop installing git-foo commands for builtins
> that enabled this escape hatch.

I think veto is too strong a word to describe what really happened, but in
retrospect, if we went ahead and removed built-ins from the filesystem as
Linus and other people advocated, the escape hatch wouldn't have worked at
all, so in that sense you are correct.  But I do not think I deserve the
credit for that---I do not see myself making an argument based on that
"possible escape-hatch" value in that old thread.

By the way, how are you researching these old discussions?  Do you have
a huge list of bookmarks?

> By the way, I don't think the lesson you should take home is the need
> for an escape hatch. Read the message by Junio on August 24th,
> 2008. Being nice and not too loud during the deprecation period kept
> users complacent about upcoming changes and upset them when the change
> finally came. Being un-nice and too loud during the deprecation period
> would have upset them early instead. You cannot avoid upsetting users
> either way whenever you change the behavior.

Yup.

And the most scary part of all is that you cannot try both.  We now know
that for 1.6.0 transition people _claimed_ that they would have liked
louder deprecation period than the way 1.6.0 transition was handled, but
that is not (and cannot be) backed by real world experience. Nobody tried
versions of git that warned loudly about the upcoming change every time he
typed "git-commit" to see if the louder deprecation period was really
preferrable.

We are taking that route for 1.7.0 to warn very loudly about pushing into
the currently checked-out branch in 1.6.2 and onwards.  We may now find
out that people hate a loud deprecation period.  Then what?

^ permalink raw reply

* How can I merge some files rather than all files modified on one  branch to my branch?
From: Emily Ren @ 2009-03-02  9:19 UTC (permalink / raw)
  To: git

Hi,

I want to merge some files rather than all files modified on one
branch to my branch, how can I do?

Thanks,
Emily

^ permalink raw reply

* Re: [PATCH v2] send-email: add --confirm option
From: Nanako Shiraishi @ 2009-03-02  9:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jay Soffian, Paul Gortmaker, git
In-Reply-To: <7v7i385meo.fsf@gitster.siamese.dyndns.org>

Quoting Junio C Hamano <gitster@pobox.com>:

> By the way, how are you researching these old discussions?  Do you have
> a huge list of bookmarks?

There weren't that many threads that became the turning points for the project; the list need not to be huge even if somebody were to keep one.

But I don't keep such a list; I just ask the gmane archive or google.

> And the most scary part of all is that you cannot try both.  We now know
> that for 1.6.0 transition people _claimed_ that they would have liked
> louder deprecation period than the way 1.6.0 transition was handled, but
> that is not (and cannot be) backed by real world experience. Nobody tried
> versions of git that warned loudly about the upcoming change every time he
> typed "git-commit" to see if the louder deprecation period was really
> preferrable.
>
> We are taking that route for 1.7.0 to warn very loudly about pushing into
> the currently checked-out branch in 1.6.2 and onwards.  We may now find
> out that people hate a loud deprecation period.  Then what?

Then you get flamed.

But isn't that what the maintainer is there for? To take blame on behalf of others, so that contributors can propose what they genuinely believe improvements without fearing what the end users would say?

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply

* Maintainer for autoconf in git (was: [PATCH] autoconf: Add limited support for --htmldir)
From: Jakub Narebski @ 2009-03-02  9:30 UTC (permalink / raw)
  To: David Syzdek; +Cc: git, Junio C Hamano
In-Reply-To: <9a0027270902280105hcad47c0r30bdd8379932442e@mail.gmail.com>

On Sat, 28 Feb 2009, David Syzdek wrote:

> Are you more or less the maintainer of the configure.ac file?  Or is
> it more of a "hive" effort?  There are a few things that could be done
> to make the file a little more readable and maintainable.  For
> instance, breaking the macro functions into acinclude.m4 instead of
> keeping them in the configure.ac file.
> 
> I'd be willing to help or take the brunt of the work, but I would like
> to coordinate with someone whom is familiar with the interaction
> between the Makefile and configure.ac.
> 
> I have a decent amount of experience with using the autotools and am
> comfortable with autoconf.
> 
> Let me know if you think this is a good idea or not.

It is true that I have added [optional] support for autoconf to git,
and I think the idea of having optional ./configure support in the form
of generating configuration file for Makefile, overriding the guesswork
based on uname, and being overridden by user's customization is mine.

But I have next to no experience (except for the work on git) with 
autotools / autoconf. Additionally keeping up configure.ac and 
config.mak.in in sync with changes to Makefile (build system) needs
time which I don't have much of. So I very much would like for someone 
with better knowledge of autotools to take over maintaining configure 
for git.

The thing to remember is that ./configure has to be entirely optional...

P.S. On of things that autoconf needs to work better is to have fallback 
install-sh script in git sources... which I think also would help in 
the case where we do not use ./configure, but are on some legacy 
system.
-- 
Jakub Narebski
Poland

^ permalink raw reply

* [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Peter Krefting @ 2009-03-02  8:47 UTC (permalink / raw)
  To: git

When opening a file through open() or fopen(), the path passed is
UTF-8 encoded. To handle this on Windows, we need to convert the
path string to UTF-16 and use the Unicode-based interface.
---
Windows does support file names using arbitrary Unicode characters, you just 
need to use its wchar_t interfaces instead of the char ones (the char ones 
just gets converted into wchar_t on the API level anyway, for the same 
reasons). This is the beginnings of support for UTF-8 file names on Git on 
Windows.

Since there is no real file system abstraction beyond using stdio (AFAIK), I 
need to hack it by replacing fopen (and open). Probably opendir/readdir as 
well (might be trickier), and possibly even hack around main() to parse the 
wchar_t command-line instead of the char copy.

This will lose all chances of Windows 9x compatibility, but I don't know if 
there are any attempts of supporting it anyway?

Please note that MultiByteToWideChar() will reject any invalid UTF-8 
strings, perhaps it should just fall back to a regular open()/fopen() in 
that case?

No Signed-Off line since this is unfinished, just presenting rough sketches 
of an idea.

  compat/mingw.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
  compat/mingw.h |    3 ++
  2 files changed, 62 insertions(+), 1 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index e25cb4f..8b19b80 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -9,13 +9,30 @@ int mingw_open (const char *filename, int oflags, ...)
  {
  	va_list args;
  	unsigned mode;
+	wchar_t *unicode_filename;
+	int unicode_filename_len;
  	va_start(args, oflags);
  	mode = va_arg(args, int);
  	va_end(args);

  	if (!strcmp(filename, "/dev/null"))
  		filename = "nul";
-	int fd = open(filename, oflags, mode);
+
+	unicode_filename_len = MultiByteToWideChar(CP_UTF8, 0, filename, -1, NULL, 0);
+	if (0 == unicode_filename_len) {
+		errno = EINVAL;
+		return -1;
+	};
+
+	unicode_filename = xmalloc(unicode_filename_len * sizeof (wchar_t));
+	if (NULL == unicode_filename) {
+		errno = ENOMEM;
+		return -1;
+	}
+	MultiByteToWideChar(CP_UTF8, 0, filename, -1, unicode_filename, unicode_filename_len);
+	int fd = _wopen(unicode_filename, oflags, mode);
+	free(unicode_filename);
+
  	if (fd < 0 && (oflags & O_CREAT) && errno == EACCES) {
  		DWORD attrs = GetFileAttributes(filename);
  		if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY))
@@ -24,6 +41,47 @@ int mingw_open (const char *filename, int oflags, ...)
  	return fd;
  }

+FILE *mingw_fopen (const char *filename, const char *mode)
+{
+	wchar_t *unicode_filename, *unicode_mode;
+	int unicode_filename_len, unicode_mode_len;
+	FILE *fh;
+
+	unicode_filename_len = MultiByteToWideChar(CP_UTF8, 0, filename, -1, NULL, 0);
+	if (0 == unicode_filename_len) {
+		errno = EINVAL;
+		return NULL;
+	};
+
+	unicode_filename = xmalloc(unicode_filename_len * sizeof (wchar_t));
+	if (NULL == unicode_filename) {
+		errno = ENOMEM;
+		return NULL;
+	}
+	MultiByteToWideChar(CP_UTF8, 0, filename, -1, unicode_filename, unicode_filename_len);
+
+	unicode_mode_len = MultiByteToWideChar(CP_UTF8, 0, mode, -1, NULL, 0);
+	if (0 == unicode_mode_len) {
+		free(unicode_filename);
+		errno = EINVAL;
+		return NULL;
+	};
+
+	unicode_mode = xmalloc(unicode_mode_len * sizeof (wchar_t));
+	if (NULL == unicode_mode) {
+		free(unicode_mode);
+		errno = ENOMEM;
+		return NULL;
+	}
+	MultiByteToWideChar(CP_UTF8, 0, mode, -1, unicode_mode, unicode_mode_len);
+
+	fh = _wfopen(unicode_filename, unicode_mode);
+	free(unicode_filename);
+	free(unicode_mode);
+
+	return fh;
+}
+
  static inline time_t filetime_to_time_t(const FILETIME *ft)
  {
  	long long winTime = ((long long)ft->dwHighDateTime << 32) + ft->dwLowDateTime;
diff --git a/compat/mingw.h b/compat/mingw.h
index 4f275cb..235df0a 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -142,6 +142,9 @@ int sigaction(int sig, struct sigaction *in, struct sigaction *out);
  int mingw_open (const char *filename, int oflags, ...);
  #define open mingw_open

+FILE *mingw_fopen (const char *filename, const char *mode);
+#define fopen mingw_fopen
+
  char *mingw_getcwd(char *pointer, int len);
  #define getcwd mingw_getcwd

-- 
1.6.0.2.1172.ga5ed0

^ permalink raw reply related

* Re: How can I merge some files rather than all files modified on one branch to my branch?
From: Junio C Hamano @ 2009-03-02 10:04 UTC (permalink / raw)
  To: Emily Ren; +Cc: git
In-Reply-To: <856bfe0e0903020119y68188a39m90c683949220b2f@mail.gmail.com>

Emily Ren <lingyan.ren@gmail.com> writes:

> I want to merge some files rather than all files modified on one
> branch to my branch, how can I do?

In general, you do not want to, simply because the result is not really a
merge.

You are rejecting one of the two primary advantages git brings over other
traditional systems: merge tracking [*1*]..

But if you really wanted to, here is how.

Suppose you are on branch A, and if you merged branch B in a natural way
it would bring in changes to file1 and file2.  But you do not want any
change to file2.  You can do this:

 (1) Start from a clean state.  No uncommitted changes (if you have some,
     stash them away first).  Merge the branch the usual way:

     $ git merge B

 (2) (1) may get conflicts in file1 and/or file2.  Resolve the conflicts
     only in the file(s) you are interested in (in this example file1).
     Ignore conflicts in files you do not want to get changed in this
     merge.  And then conclude the merge with:

     $  git commit -a

     This step is necessary only if (1) does not cleanly merge; otherwise
     it would have created a merge commit already.

 (3) You did not want changes to certain paths (in this example, file2) in
     the "merge", but we recorded such change in the previous step, so you
     amend it.  At this point HEAD is the merge you created, and HEAD^ is
     before you started the merge.  You want the contents of file2 before
     the merge happened, i.e. from HEAD^:

     $ git checkout HEAD^ file2
     $ git commit --amend

The result would record a "merge" that ignores what B did to file2.

The reason you do not want to do this is because git tracks merges.

Suppose you have this topology:

             A---M
            /   /
    ---o---C---B

A is where you were before this "merge", B is the other branch, and M is
the result of the "merge".

Now, suppose branch B later improves on what it has done, and now what the
branch has is satisfactory for your needs.  There is no reason you would
not want to have any of its improvements.  You try to merge again.

             A---M-----------?
            /   /           /
    ---o---C---B---D---E---F

Because you declared (when you made the "merge" at M) that anything B did
to file2 was unwanted to your branch, git remembers that declaration.  The
information is used when computing the merge between M and F.  That is
what merge tracking is.

Perhaps changes to file2 when you inspected B were not good enough, but
with improvements made in D, E and/or F it may now be perfect.  But you
are denying to take this whole sequence of changes, and the merge you
would create between M and F will have only what D/E/F did to file2;
it won't contain what B did.  That makes the sequence of changes B-D-E-F
did to file2 incomplete and inconsistent.

If you are lucky, it will result in huge conflicts and you will notice the
situation.  But if you are not lucky, it may merge cleanly, but because
D/E/F builds on top of what B did, which possibly was half-baked back
then, for the merge result to work as well as F does, you need to have
what B did.  But you won't have it, because you already rejected it at M.

Two advises are:

 (1) To avoid problems in future merges, do not record the result as a
     "merge". You want to cherry-pick partially; in other words, you would
     want to record this topology when you create this "merge" (which is
     not a merge):

             A---M
            /
    ---o---C---B

    with the same tree as you would have recorded in the first picture.
    For that, I think you could run "git merge --squash B" in step (1) in
    the main part of this message.

 (2) More importantly, your wish to take only one but not the other part
     of B is a strong indication that the branch B is doing too many
     things, either in a single commit or either on a single branch or
     both. Separate these distinct bits into different topic branches so
     that each individual bits can be independently merged to other
     branches.

[Footnote]

*1* The other advantage is distributedness, which is "separation between
the act of committing and the act of publishing".

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Johannes Sixt @ 2009-03-02 10:30 UTC (permalink / raw)
  To: Peter Krefting; +Cc: git
In-Reply-To: <alpine.DEB.2.00.0903020941120.17877@perkele.intern.softwolves.pp.se>

Peter Krefting schrieb:
> When opening a file through open() or fopen(), the path passed is
> UTF-8 encoded.

I don't think that this assumption is valid. Whenever the Windows API has
to convert between Unicode strings and char* strings, it uses the current
"ANSI code page". As far as I know, the UTF-8 codepage (65001) cannot be
used as the "current ANSI code page". Users will always have some code
page set that is not UTF-8.

For example, if the user specifies a file name on the command line, than
it will not enter git in UTF-8, but in the current "ANSI" or "OEM code
page" encoding. If git prints a file name under the assumption that it is
UTF-8 encoded, then it will be displayed incorrectly because the system
uses a different encoding.

> Since there is no real file system abstraction beyond using stdio
> (AFAIK), I need to hack it by replacing fopen (and open). Probably
> opendir/readdir as well (might be trickier), and possibly even hack
> around main() to parse the wchar_t command-line instead of the char copy.

I think you are grossly underestimating the venture that you want to
undertake here.

Please come up with a plan how you are going to deal with the various
issues. File names enter and leave the system through different channels:

- the command line and terminal window
- object database (tree objects)
- opendir/readdir; opening files or directories for reading or writing

And there is probably some more... How do you treat encodings in these
channels? What if the file names are not valid UTF-8? Etc.

The biggest obstacle will be that git does not have a notion of "file name
encoding" - it simply treats a file name as a stream of bytes. There is no
place to write an encoding. If the byte streams are regarded as having an
encoding, then you can have ambiguities, mixed encodings, or invalid
characters. You would have to deal with this in some way.

> This will lose all chances of Windows 9x compatibility, but I don't know
> if there are any attempts of supporting it anyway?

Windows 9x is already out of the loop. We use GetFileInformationByHandle()
that is only available since Windows 2000.

-- Hannes

^ permalink raw reply

* Re: [PATCH v2] send-email: add --confirm option
From: Felipe Contreras @ 2009-03-02 10:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nanako Shiraishi, Jay Soffian, Paul Gortmaker, git
In-Reply-To: <7v7i385meo.fsf@gitster.siamese.dyndns.org>

On Mon, Mar 2, 2009 at 11:01 AM, Junio C Hamano <gitster@pobox.com> wrote:
> We are taking that route for 1.7.0 to warn very loudly about pushing into
> the currently checked-out branch in 1.6.2 and onwards.  We may now find
> out that people hate a loud deprecation period.  Then what?

The problem is not the 'loud deprecation period' it's the deprecation
itself. You cannot avoid deprecation, so you cannot avoid users
complaining, but you can avoid surprises, and that's what the 'loud
deprecation period' is for.

The 'loud deprecation period' allows users to find out *earlier* so
that they can comment on the issue. If a huge amount of users
complain, maybe the deprecation should not proceed, or maybe someone
comes up with a plan B. Sill, some people would not be happy, but at
least their voice would have been heard.

Sure, it doesn't matter how it's handled, some people will still not be happy...

-- 
Felipe Contreras

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Peter Krefting @ 2009-03-02 10:46 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git
In-Reply-To: <49ABB529.1080500@viscovery.net>

Johannes Sixt:

> I don't think that this assumption is valid.

Depends on where you are coming from. For the files stored in the Git 
repositories, I believe all file names are supposed to be UTF-8 encoded 
(just like commit messages and user names are). That's the assumption I 
started working from.

> Users will always have some code page set that is not UTF-8.

Indeed. And as long as the char-pointer interfaces in stdio and elsewhere 
work on that assumption, we have a problem.

> For example, if the user specifies a file name on the command line, than
> it will not enter git in UTF-8, but in the current "ANSI" or "OEM code
> page" encoding.

That problem is already solved as we do have a wchar_t command line 
available. If you pass a file name that is not representable in the current 
"ANSI" codepage on the command line, it will come out as garbage in the 
char* version, but will be correct in the wchar_t* version. Thus we need to 
convert that to utf-8 and use that instead.

> If git prints a file name under the assumption that it is UTF-8 encoded, 
> then it will be displayed incorrectly because the system uses a different 
> encoding.

Here setting the local codepage to UTF-8 *might* work, although I haven't 
tested that. Or always use the wchar_t versions of printf and friends.

> I think you are grossly underestimating the venture that you want to 
> undertake here.

I've done this before with other software, so, yes, I know it is quite a big 
undertaking. That is also why I started out with a minimal RFC patch to see 
if there was any interest in working with this.

> Please come up with a plan how you are going to deal with the various
> issues. File names enter and leave the system through different channels:
>
> - the command line and terminal window

GetCommandLineW() as decribed above.

> - object database (tree objects)

Those file names are supposedly always UTF-8.

> - opendir/readdir; opening files or directories for reading or writing

Wrap file open and directory read to use the wchar_t versions, converting 
that to UTF-8 strings at the API level.

> And there is probably some more... How do you treat encodings in these 
> channels? What if the file names are not valid UTF-8? Etc.

Ill-formed UTF-8 should just be rejected. Invalid UTF-8 is worse. I'm not 
sure what the Linux version does, when running in a UTF-8 locale. Does it 
allow ill-formed or illegal UTF-8 sequences?

NTFS allows almost any sequence of wchar_t's, it doesn't even have to be 
valid UTF-16.

> The biggest obstacle will be that git does not have a notion of "file name 
> encoding" - it simply treats a file name as a stream of bytes.

Yeah, that is one of the major bugs in its design, IMHO. But almost everyone 
seems to assume that file names are UTF-8 strings anyway, so in the absence 
of any other information, it's a good assumption as any to make.

> If the byte streams are regarded as having an encoding, then you can have 
> ambiguities, mixed encodings, or invalid characters. You would have to 
> deal with this in some way.

Considering we already see problems with file names that cannot properly be 
represented on some file systems (case-only differences in the Linux kernel 
when checked out on Windows; Mac OS' built-in Unicode normalization of file 
names, etc.)

> Windows 9x is already out of the loop.

Good.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Johannes Schindelin @ 2009-03-02 10:56 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Johannes Sixt, git
In-Reply-To: <alpine.DEB.2.00.0903021137110.17877@perkele.intern.softwolves.pp.se>

Hi,

On Mon, 2 Mar 2009, Peter Krefting wrote:

> Johannes Sixt:
> 
> > I don't think that this assumption is valid.
> 
> Depends on where you are coming from. For the files stored in the Git 
> repositories, I believe all file names are supposed to be UTF-8 encoded 
> (just like commit messages and user names are). That's the assumption I 
> started working from.

No.  As far as Git is concerned, the file names are just as much blobs as 
the file contents.

The fact that Windows messes with this notion just as it messes with the 
file contents (think the endless story whose name is CR/LF) shows only how 
"well" designed the concepts in Windows are.

And as it stands, we have at least two issues on the msysGit issue tracker 
that complain that Git does not work with localized file names properly.

So no, file names are not UTF-8 at all, especially not on Windows.

Do not get me wrong, I really welcome you taking care of the issue, but I 
do not think that forcing UTF-8 is a solution.

Thanks & sorry,
Dscho

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Peter Krefting @ 2009-03-02 12:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, git
In-Reply-To: <alpine.DEB.1.00.0903021153520.10279@pacific.mpi-cbg.de>

Johannes Schindelin:

> No.  As far as Git is concerned, the file names are just as much blobs as 
> the file contents.

I've struggled with the same problems on Linux before, since its file 
systems doesn't have the concept of characters, either. I guess it's just 
design principles, but as far as I am concerned, having file names be 
constructed from characters makes a lot more sense than having them 
constructed from bytes.

Git does the right thing in assuming commit messages and user names be UTF-8 
characters, though, it would have been nice to have file names covered by 
the same constraints.

> The fact that Windows messes with this notion just as it messes with the 
> file contents (think the endless story whose name is CR/LF) shows only how 
> "well" designed the concepts in Windows are.

In this case, yes, Windows' way of doing does make more sense, at least to 
me. And as far as text files are concerned, treating text as sequences of 
bytes are in most cases not a very smart thing to do, either, but it's hard 
not to given how most computers are constructed.

> And as it stands, we have at least two issues on the msysGit issue tracker 
> that complain that Git does not work with localized file names properly.
>
> So no, file names are not UTF-8 at all, especially not on Windows.

I am not trying to make file names *on Windows* to be UTF-8. I am trying to 
make file names on Windows be Windows file names, i.e UTF-16 Unicode. It's 
just that since Git internally uses the char* APIs, and from what I have 
seen in most other cases assume that char* text is UTF-8, I am trying to 
convert from Windows' view of path names to Git's (UTF-16 to UTF-8) and back.

The other way would be to keep the char* APIs but convert to the Windows 
locale encoding ("ANSI codepage"), but that will break horribly as not all 
file names that can be used on a file system can be represented as such. 
Plus, all calls to a Windows API using a char* path name *is* converted into 
UTF-16 anyway, since that is what is used internally in the Windows NT 
subsystems.

> Do not get me wrong, I really welcome you taking care of the issue, but I a
> do not think that forcing UTF-8 is a solution.

Some kind of handling of Git repositories where file names are not UTF-8 
would probably need to be added, yes.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply

* git-svn multiple branches and merging
From: Igor Lautar @ 2009-03-02 12:09 UTC (permalink / raw)
  To: git

Hi All,

I'm using git-svn to manage quite large svn repository. This
repository also does not follow 'general' svn rules about how to name
branches.

So we have something like:
trunk -> development
branches\version1 -> version1 maintenance
branches\custom\version1_fix -> customized version1 with certain fixes

etc.

When importing, I've only imported trunk and branches I'm interested
in. Thus, I have multiple remotes for which git-svn does not know they
are related (or how they branched from each other). Also, I have not
imported whole history, as its just to much trouble.

Now, I want to start a new branch, lets say branches\dev1, which is
branches from trunk. This will be used for various improvements, which
do not go to trunk immediatelly.
I also want to keep this branch in sync with main trunk.

Up to now, I have been doing this by git-cherry-pick all changes from
dev1 branch point. Is there a better way to do? Note that branch dev1
in git-svn does not know about previous commits in trunk (git remote
ref was initialized from branch point for dev1).

Just merging trunk (represented by a remote in git-svn) makes a mess
(as expected). Basically, what I want to do is tell git-svn that merge
was already done up to a certain point from that branch so git-merge
then only picks up new changes from that point on (and the ones that
have not been cherry-picked).

Is there a way to get out of this mess? I'm fine with cherry-pick, but
it requires some manual labor (like remembering/finding last
cherry-picked commit).

Thank you,
Igor

^ permalink raw reply

* Re: [PATCH v2] send-email: add --confirm option
From: Jay Soffian @ 2009-03-02 12:33 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: Paul Gortmaker, git, Junio C Hamano
In-Reply-To: <20090302172401.6117@nanako3.lavabit.com>

On Mon, Mar 2, 2009 at 3:24 AM, Nanako Shiraishi <nanako3@lavabit.com> wrote:
> By the way, I don't think the lesson you should take home is the need for an escape hatch. Read the message by Junio on August 24th, 2008. Being nice and not too loud during the deprecation period kept users complacent about upcoming changes and upset them when the change finally came. Being un-nice and too loud during the deprecation period would have upset them early instead. You cannot avoid upsetting users either way whenever you change the behavior. That's the lesson you should learn.

Thank you for correcting me, and I apologize for the misinformation.

It is very difficult to balance between the users who are upset enough
by the change to be vocal about it, and the users who benefit from the
change but that this list never hears from, but I appreciate that
Junio is trying to accommodate both.

j.

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Johannes Sixt @ 2009-03-02 12:34 UTC (permalink / raw)
  To: Peter Krefting; +Cc: git
In-Reply-To: <alpine.DEB.2.00.0903021137110.17877@perkele.intern.softwolves.pp.se>

Peter Krefting schrieb:
> Johannes Sixt:
>> If git prints a file name under the assumption that it is UTF-8
>> encoded, then it will be displayed incorrectly because the system uses
>> a different encoding.
> 
> Here setting the local codepage to UTF-8 *might* work, although I
> haven't tested that. Or always use the wchar_t versions of printf and
> friends.

You cannot expect users to switch the locale. For example, I have to test
our software with Japanese settings: I *cannot* switch to UTF-8 just
because of git.

Can you set the local codepage per program? (I don't know.) It might help
here, but it doesn't help in all cases, particularly in certain pipelines:

  git ls-files -o
  git ls-files -o | git update-index --add --stdin
  find . -name \*.jpg | git update-index --add --stdin

- What encoding should 'ls-files' use for its output? Certainly not always
UTF-8: stdout should use the local code page so that the file names are
interpreted correctly by the terminal window (it expects the local code page).

- What encoding should 'update-index' expect from its input? Can you be
sure that other programs generate UTF-8 output?

How do you solve that?

-- Hannes

^ permalink raw reply

* Re: [PATCH v2] send-email: add --confirm option
From: Jay Soffian @ 2009-03-02 12:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nanako Shiraishi, Paul Gortmaker
In-Reply-To: <7vwsb85qe9.fsf@gitster.siamese.dyndns.org>

On Mon, Mar 2, 2009 at 2:34 AM, Junio C Hamano <gitster@pobox.com> wrote:
> In any case, with the lesson I learned from the post v1.6.0 fiasco, it
> might make sense to make the command louder when it needs to look at the
> setting of $confirm option and when the user does not have anything in the
> config nor command line.
>
> What I mean is this.

Okay, I'll re-send. Thanks for the feedback.

j.

^ permalink raw reply

* Re: [RFC] Refspec patterns with * in the middle
From: Jay Soffian @ 2009-03-02 12:54 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git
In-Reply-To: <alpine.LNX.1.00.0903011820590.19665@iabervon.org>

On Sun, Mar 1, 2009 at 6:42 PM, Daniel Barkalow <barkalow@iabervon.org> wrote:
> I've got an annoying repository where all of the branches upstream[*] have
> names, for a project "my-proj" like:
>
> some/constant/stuff/$VERSION/junk/my-proj
>
> I'd like to be able to use refspecs like:
>
>  fetch = some/constant/stuff/*/junk/my-proj:refs/remotes/origin/*
>
> I've written an implementation (which mainly involved having only one
> place do the matching and replacement for pattern refspecs, and then
> making that one place a little more clever), so that's not hard. But we
> currently prohibit refspecs like this, and I think we may want to prohibit
> some patterns of this general form, in order to keep typos from doing
> surprising things.
>
> My use case is actually, more precisely:
>
> some/constant/stuff/$PROJ-$NUMBER/junk/my-proj
>
> Where $NUMBER is the version number, and $PROJ is usually, but not quite
> always "my-proj"; the exception being that it might be effectively a
> superproject. So I'd like to have:
>
>  fetch = some/constant/stuff/my-proj-*/junk/my-proj:refs/remotes/origin/*
>
> But I can live with remote branches like "my-proj-2.4" instead of "2.4".
>
> I think it would make sense, and limit typo damage, to say that the * can
> only expand to something with a '/' in it if the star has a slash or the
> end of the string on each side.

That seems more confusing that just saying: '*' matches everything but
the path separator ('/'), and whatever it matches on the LHS of the
':' is what it expands to on the RHS. I'm not sure how a typo would
damage anything, but this could always be enabled with
core.refspec.glob_anywhere or some such.

I think regex support is too much:

  fetch = some/constant/stuff/(my-proj-[^/]*)/junk/my-proj:refs/remotes/origin/\1

(Which in a git config, I think may need a double-backslash, but I
forget what the config parser does.)

j.

^ permalink raw reply

* Re: merge, keeping the remote as a new file?
From: Jay Soffian @ 2009-03-02 13:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Björn Steinbrink, Charles Bailey, Caleb Cushing, git
In-Reply-To: <20090302070406.GA12937@coredump.intra.peff.net>

On Mon, Mar 2, 2009 at 2:04 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Mar 02, 2009 at 07:59:49AM +0100, Björn Steinbrink wrote:
>
>> Hm, how about this?
>> git checkout --theirs file
>> git mv file newname
>> git checkout HEAD file # Can't use --ours here due to the mv
>
> Actually, you can use --ours if you don't "git mv":
>
>  git checkout --theirs file
>  mv file newfile
>  git checkout --ours file
>  git add file newfile
>
> One more command, but I think more obvious about what is going on (and I
> think both are better than the other suggestions).

This is a superior answer as well because it avoids plumbing in a
situation where plumbing ought not be needed.

j.

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Peter Krefting @ 2009-03-02 13:12 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git
In-Reply-To: <49ABD24B.5060005@viscovery.net>

Johannes Sixt:

> Can you set the local codepage per program? (I don't know.)

The locale is set per thread, and gets reset when the program exits. So 
setting the codepage to UTF-8 before outputting should work. That should 
also work for displaying the log to the terminal if you have UTF-8 log 
messages.

Converting it to wchar_t and using wprintf and similar should be safer, 
though (and I have no idea what happens if you try to pipe the output to 
something else).

> - What encoding should 'ls-files' use for its output? Certainly not always 
> UTF-8: stdout should use the local code page so that the file names are 
> interpreted correctly by the terminal window (it expects the local code 
> page).

That is exactly why trying to mix "protocol" data ("plumbing" in Git's case) 
and user output will always come back and bite you, one way or another. I 
haven't really the faintest how pipes work with Unicode on Windows. 
Somewhere along the line there will probably be some conversions, which 
would cause interesting issues.

Better not use pipes, then. Heh. I sense that there is a slight problem with 
the architecture of Git and trying to get it to behave on Windows... :-)

> - What encoding should 'update-index' expect from its input? Can you be 
> sure that other programs generate UTF-8 output?

Theoretically, if all the internal stuff is hacked around to output Unicode, 
and the thread codepage is set up to use UTF-8, it should "just work". And 
if run directly from the shell, it should still be converted to whatever the 
system is set up to emit. That would mean, however, that a Git program that 
internally runs

   git-foo | git-bar | git-gazonk

might behave differently compared to if a user would enter it on the 
command-line.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply

* http: a non-curl_multi interface?
From: Tay Ray Chuan @ 2009-03-02 13:14 UTC (permalink / raw)
  To: git

HI,

there's been several complaints about how git uses curl, particularly
how it forces one to use curl's multi interface, so I've tried my hand
at implementing a curl interface that doesn't need curl_multi.

This would allow git to work without curl_multi.

The non-curl_multi set of functions are done, and I've also taught
push and http-push --persistent, to force git to behave as though
USE_CURL_MULTI isn't defined.

Do you guys think this would bring any benefits, apart from requiring
the user to use a curl library with the multi interface? Based on what
I read in the docs, this would mean less open/closed connections,
minimized credential prompting (if authentication is required), more
backward compatibility, but it would also mean a possible performance
degradation in git, since all http requests are sequential.

--
Cheers,
Ray Chuan

^ permalink raw reply

* Re: http: a non-curl_multi interface?
From: Daniel Stenberg @ 2009-03-02 13:26 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git
In-Reply-To: <be6fef0d0903020514h28995ec2v2acd9f65131c1515@mail.gmail.com>

On Mon, 2 Mar 2009, Tay Ray Chuan wrote:

I'm replying on this topic as a libcurl guy, I don't know much of git 
internals.

> Do you guys think this would bring any benefits, apart from requiring
> the user to use a curl library with the multi interface?

You mean NOT requiring then I guess.

What I don't quite grasp (and I must admit I have not followed the critique on 
this matter) is why using the multi interface of libcurl is a problem to 
anyone as all libcurl versions in modern times features it. And if you have a 
libcurl with it working badly, you have a too old libcurl anyway and should 
rather upgrade...

> Based on what I read in the docs, this would mean less open/closed 
> connections,

I don't see how that is true. In fact, properly used I would claim that an 
application using the multi interface would in general use less connections 
and do more connection re-use than otherwise. But of course it depends on a 
lot of factors.

Again, this requires a reasonably recent libcurl (since 7.16.0 - october 2006 
- libcurl keeps the "connection cache" in the multi handle instead of in each 
individual easy handle.)

> minimized credential prompting (if authentication is required), more 
> backward compatibility, but it would also mean a possible performance 
> degradation in git, since all http requests are sequential.

I figure you can test that fairly easy now when you have a patch pending for 
this change and the existing code base is using the multi interface 
approach...

-- 

  / daniel.haxx.se

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Peter Krefting @ 2009-03-02 13:57 UTC (permalink / raw)
  To: git
In-Reply-To: <a2633edd0903020512u5682e9am203f0faccd0acf6a@mail.gmail.com>

Hi!

> Makes sense too. I think the whole API would have to be changed to use 
> TCHAR*.

I'd rather just say wchar_t explicitely. I'm not particularly fond of macros 
that change under your feet just because you fail to define a symbol 
somewhere...

> Then you need to do the right conversion at the right places, this will be 
> quite tricky, painful work, but there is probably no way around that.

In the other project I worked on we ended up wrapping all file-related calls 
in our own porting interface, and then let each platform we compiled for 
implement their own methods for handling Unicode paths. For Windows it's 
trivial since all APIs are Unicode. For Unix-like OSes it's tricky as you 
have to take the locale settings into account, but fortunately the world is 
slowly moving towards UTF-8 locales, which eases the pain a bit.

> Note that not only conversions will be needed but you'll also need to 
> adjust all routines handling filenames to use the proper Unicode version. 
> (strchr -> _tstrchr, open -> _topen, strcpy -> _tstrcpy, strlen -> 
> _tcslen, ...).

Not necessarily. If the code can be set up to use UTF-8 char* internally, 
not everything needs to be rewritten (I've done that too, only took a 
couple of years to move the codebase over to all-Unicode).

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply

* Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.
From: Thomas Rast @ 2009-03-02 14:29 UTC (permalink / raw)
  To: Peter Krefting; +Cc: git
In-Reply-To: <alpine.DEB.2.00.0903021452010.17877@perkele.intern.softwolves.pp.se>

[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]

Peter Krefting wrote:
> In the other project I worked on we ended up wrapping all file-related calls 
> in our own porting interface, and then let each platform we compiled for 
> implement their own methods for handling Unicode paths. For Windows it's 
> trivial since all APIs are Unicode. For Unix-like OSes it's tricky as you 
> have to take the locale settings into account, but fortunately the world is 
> slowly moving towards UTF-8 locales, which eases the pain a bit.

Have you thought about all the consequences this would have for the
*nix people here? [*]

Even if you pretend that Git did always enforce UTF-8 paths in its
trees, so that there's no backward compatibility to be cared for,
you're still in a world of hurt when trying to check out such paths
under a locale (or whatever setting might control this new encoding
logic) that does not support the whole range of UTF-8.

Like, say, the C locale.

Next you get to see to it that the users can spell all filenames even
if their locale doesn't let them, since they'll want to do things like
'git show $rev:$file' with them.

With backwards compatibility it's even worse as you're suddenly
imposing extra restrictions on what a valid filename in the repository
must look like.

[*] I'm _extremely_ tempted to write "people using non-broken OSes",
but let's pretend to be neutral for a second.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Bug in Git-Gui - Creates corrupt patch
From: Grzegorz Kossakowski @ 2009-03-02 14:34 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: 4jxDQ6FQee2H, spearce, git
In-Reply-To: <49A567C9.5050203@viscovery.net>

Johannes Sixt pisze:
> 4jxDQ6FQee2H@dyweni.com schrieb:
>> 3. Using git-gui, try to stage *only* the last line marked for removal
>> (should be '-	}').
>>
>> I get 'fatal: corrupt patch at line 22'.
> 
> "Stage/Unstage line" does not work for files that have
> 
> \ No newline at end of file

I've just stumpled across this problem. Does above imply that reported problem is not considered as a bug?

If so I believe that git gui should enforce new lines at the end of a file or at least provide more meaningful error msg.

-- 
Best regards,
Grzegorz Kossakowski

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox