All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: jukio@cox.net
Cc: git@vger.kernel.org
Subject: UTF-8 trouble with GIT 1.5.0.3
Date: Sat, 17 Mar 2007 11:27:30 +0100	[thread overview]
Message-ID: <20070317102730.GA15256@1wt.eu> (raw)

Hi Junio,

after upgrading from GIT 1.4 to 1.5, I noticed that I had got
errors when applying patches from mailboxes and that I had to 
manually add "--no-utf8" to git-am. The error was always :

  "fatal: cannot convert from latin1 to utf-8".

This morning, I decided to take a closer look at it, first
trying to fix it in the configuration. Even this fails :

[i18n]
        commitencoding = latin1

$ git-am -3 username.patch 
fatal: cannot convert from latin1 to latin1


So I took a look at the code and noticed that in builtin-mailinfo,
convert_to_utf8() is always called because metainfo_charset is
always set. Also, this function always calls reencode_string(),
no matter what input and outputs are passed. And since it fails
for an unknown reason, I get the error message.

Now I have several options :
 
 - always pass --no-utf8 with git-am to leave metainfo_charset
   NULL and avoid the call to convert_to_utf8()
   => very annoying and I have to rm -rf .dotest every time I forget
      about it

 - add a check to convert_to_utf8() so that when input and output
   charsets are equal, it does not malloc/convert/copy/free the
   result. This would seem necessary anyway.

 - check for empty charset name instead of just a NULL pointer,
   so that it is still possible by configuration to stay compatible
   with old default behaviour which is : not to convert by default.

 - switch back to 1.4 : this will be my immediate option while
   waiting for any advice on the right long-term solution.

I would be glad to get a recommendation for one of the solutions
above. I can workout a patch, but I don't want to waste my time
doing crappy patches that will not be retained.

Please note that I don't know a damn thing about the way iconv
works, which may be one of the reason I really hate it. So it
will take far too much time to me to dig below reencode_string().

Also, I am really worried for buffer overflows here. Looking at
the code, it seems like headers from input mails are passed to
decode_header(), which itself calls decode_header_bq() which in
turns calls convert_to_utf8() with input and output buffers of
256 bytes. Same for mailinfo() which calls check_header_line()
with <line> which has been filled up to sizeof(line) (1000 bytes).

Unless I'm mistaken, encoding from latin1 to utf-8 can grow the
string by a 5-to-1 ratio. So this means that the blind strcpy()
in convert_to_utf8() could overflow on user-crafted mail headers.
Maybe I overlooked something :-/

Regards,
Willy

                 reply	other threads:[~2007-03-17 10:38 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070317102730.GA15256@1wt.eu \
    --to=w@1wt.eu \
    --cc=git@vger.kernel.org \
    --cc=jukio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.