git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Erik Faye-Lund <kusmabite@gmail.com>
Cc: git@vger.kernel.org, msysgit@googlegroups.com
Subject: Re: [PATCH/RFC] contrib: add win32 credential-helper
Date: Wed, 4 Apr 2012 07:23:42 -0400	[thread overview]
Message-ID: <20120404112341.GA28354@sigill.intra.peff.net> (raw)
In-Reply-To: <CABPQNSYRxZHSW_urpH7yh=Bct2-X2TAsd3=jaJF_9Rb-foegLA@mail.gmail.com>

On Wed, Apr 04, 2012 at 12:37:25PM +0200, Erik Faye-Lund wrote:

> Sure, but this is a problem with HTTP-auth, we don't have to make it a
> problem of our credential-api as well. For instance,  we could try to
> send the credentials to the server as UTF-8 first, and if that fails
> re-try with ISO-8859-1. If neither works, we probably need more info;
> if it's worth it somebody could implement a per-remote locale or
> something. I'm not saying we should, but it's something that could be
> done if the HTTP-auth issue proves to be real.

Ugh. The retry thing is just hideous. I'd much rather wait for a real
solution like the draft rfc I mentioned, and then implement that (or
better yet, it is probably something that curl would implement). But in
the meantime, I haven't seen a single report on the list that what git
is doing is a problem, and I'd much rather come to the same solution
that browsers do than invent some git-specific junk.

> Losing the encoding EARLY isn't any better than trying to preserve it
> for as long as possible, and only lose it when we KNOW we're sending
> this down a path where the encoding is undefined?

I see what you're saying, but there are two points against this:

  1. We don't necessarily even know what the original encoding is.

  2. All code paths currently end in a place where we drop the encoding
     (http auth and certificate passwords). So it's sort of a non-issue
     at this point.

> And passing UTF-8 data when the encoding is undefined is probably one
> of the less insane things to attempt, no?

Not necessarily. If the user types latin1 and their webserver expects
latin1, then we will be breaking their setup. Preserving the bytes as-is
means we will do the right thing in that case. It means we will do the
wrong thing when the user types latin1 and the webserver expects utf-8,
but that is _already_ broken even without credential helpers. Have we
seen any bug reports?

> The problem is that for some helpers you need to know the encoding in
> order to have a user-interface show the data. This goes both for OSX
> and Windows. If you mess up the encoding, in the helper, then you'll
> mess up the entries in the OS'es key-managers. While this might not be
> a problem in practice for US-English users, we've seen for Git for
> Windows that e.g. some Asians are really unhappy with their username
> becoming some completely incomprehensible string.

How are they inputting their username? If they are seeing garbage, then
surely it is coming to git in some encoding that is different than what
the password manager expects. Then we give it to a helper, which hands
it to system storage, which assumes that what we give it is utf-8. So
the error there seems to me to be the layer between the helper and the
system storage, where the string is converted from "a stream of bytes"
to "a utf-8 encoded string".

So why is this a credential helper protocol problem, and not an
implementation issue for the specific helper? How does git find out the
encoding? Presumably by checking $LANG, or whatever system-specific
locale information there is. Can't the helper do the same thing?

> Saying "Hey, I'm just passing on what I got here, don't blame the
> messenger" isn't a solution, it's adding to the problem. I'm sorry if
> I'm a bit blunt here, but being a pass-through is a sorry excuse. User
> names and passwords are text-strings by definition; "name" and "word"
> both imply it. No one has a binary blob of 0xbadf00d as a username or
> password.

I never argued that people are putting in binary blobs. I do argue that
we don't necessarily know the encoding of the input, nor what encoding
is expected for the output. So by preserving bytes, we can at least not
make the situation worse.  The username git sees is some bytes from a
file handle. How do we know what encoding it's in? The concept of "text
string" is not well-defined in the Unix world, and data tends to be
considered as streams of bytes.

We can guess from $LANG, though I'm not sure how foolproof that is in
practice (more on that below).

> Now, my suggestion was to define that these are UTF-8; if we pass it
> UTF-8 it will still give back UTF-8. But it can be converted
> internally so the OS can show the credential in it's manager.

Which implies that git must be converting what the user gives us into
UTF-8. And then converting it back (to what?) when we get the data back
from the helper, before we send it to the webserver? Or do we always
send UTF-8 to the webserver? Won't that break setups that currently
work?

> What can of worms? Shouldn't getpass simply get the password in the
> current system locale? When the user inputs the string is exactly the
> moment we want to convert the "external locale" into a known encoding.

Is that always accurate on every system? If so, then why don't we do it
in every place that we take input from the user? I suspect the answer is
a fear that we would end up corrupting data as often as we end up
helping.

Maybe that fear is unfounded (it is not my decision, but rather general
git design). Maybe this is just a Unix thing, where the attitude is
usually that data is a stream of bytes. I don't know.

-Peff

  reply	other threads:[~2012-04-04 11:23 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-19 23:06 [PATCH/RFC] contrib: add win32 credential-helper Erik Faye-Lund
2012-03-23 21:10 ` Jeff King
2012-04-02 15:53   ` Erik Faye-Lund
2012-04-03  8:44     ` Erik Faye-Lund
2012-04-04 10:10     ` Jeff King
2012-04-04 10:37       ` Erik Faye-Lund
2012-04-04 11:23         ` Jeff King [this message]
2012-04-04 17:44           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120404112341.GA28354@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=kusmabite@gmail.com \
    --cc=msysgit@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).