git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Zoltán Füzesi" <zfuzesi@eaglet.hu>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] gitweb: parse_commit_text encoding fix
Date: Fri, 7 Aug 2009 22:31:33 +0200	[thread overview]
Message-ID: <200908072231.35707.jnareb@gmail.com> (raw)
In-Reply-To: <9ab80d150908060115q4b56b2e5xb327e09cda7e2b7a@mail.gmail.com>

On Thu, 6 Aug 2009, Zoltán Füzesi wrote:
> 2009/8/4 Junio C Hamano <gitster@pobox.com>:
> >
> > Thanks, Zoltán.
> >
> > We should be able to set up a script that scrapes the output to test this
> > kind of thing.  We may not want to have a test pattern that matches too
> > strictly for the current structure and appearance of the output
> > (e.g. counting nested <div>s, presentation styles and such), but if we can
> > robustly scrape off HTML tags (e.g. "elinks -dump") and check the
> > remaining payload, it might be enough.
> >
> > Jakub what do you think?  I suspect that scraping approach may turn out to
> > be too fragile for tests to be worth doing, but I am just throwing out a
> > thought.
> >
> 
> This issue comes out when chop_and_escape_str function is called with
> a non-ascii string (like my name :)) without before calling to_utf8 on
> it. "author_name" and "committer_name" are two examples, and
> "author_name" shows up with bad encoding in HTML.
> 
> Example from one of my repos (little piece from shortlog output):
> <td class="author"><span title="Füzesi Zoltán">Füzesi Zoltán</span></td>
> After applying the patch:
> <td class="author">Füzesi Zoltán</td>
> 
> This is an "old" (seen in 1.5.6 version too) and (I think) minor issue.
> I haven't spent time on thinking how a test script could show this yet.
> Waiting for Jakub's reaction.

Oh, so the problem is not only to just have correct output (for example
"Füzesi Zoltán" somewhere on HTML page produced by gitweb), but also do
not have incorrect output (for example "Füzesi Zoltán").

I think it would be better to leave t9500-gitweb-standalone-no-errors.sh
to be only about no Perl errors and no Perl warnings.  So I'd rather
have test checking if gitweb handles non US-ASCII in output correctly
in a separate test, e.g. t9501-gitweb-standalone-i18n.sh.  That would
mean extracting gitweb_init() and gitweb_run() (and perhaps also
gitweb_check_prereq() or something) into common file t/lib-gitweb.sh

We would check e.g. if "startáąend" is present in output (correct output),
and whether extracting "start[^ ]*end" produces only "startáąend" (no
incorrect output).


As for gitweb, we should make sure that everything is stored in Perl
variables and Perl structures _after_ treating with to_utf8().  This
would require some cleanup of the code, and having such test would
help to check if we didn't introduce any regressions.

-- 
Jakub Narebski
Poland

  reply	other threads:[~2009-08-07 20:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-01  8:28 [PATCH/RFC] gitweb: parse_commit_text encoding fix Zoltán Füzesi
2009-08-01  9:21 ` Jakub Narebski
2009-08-01 16:55   ` Füzesi Zoltán
2009-08-02  7:42     ` [PATCH] " Zoltán Füzesi
2009-08-04  6:59       ` Junio C Hamano
2009-08-06  8:15         ` Zoltán Füzesi
2009-08-07 20:31           ` Jakub Narebski [this message]
2009-08-07  0:41         ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908072231.35707.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=zfuzesi@eaglet.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).