public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Joe Perches <joe@perches.com>
Cc: LKML <linux-kernel@vger.kernel.org>, Andy Whitcroft <apw@shadowen.org>
Subject: Re: [PATCH] checkpatch: Add a --strict check for utf-8 in commit logs
Date: Mon, 10 Oct 2011 15:42:05 -0700	[thread overview]
Message-ID: <20111010154205.02c3bd3e.akpm@linux-foundation.org> (raw)
In-Reply-To: <1318285950.2149.24.camel@Joe-Laptop>

On Mon, 10 Oct 2011 15:32:30 -0700
Joe Perches <joe@perches.com> wrote:

> Some find using utf-8 in commit logs inappropriate.
> 
> Some patch commit logs contain unintended utf-8 characters
> when doing things like copy/pasting compilation output.
> 
> Look for the start of any commit log by skipping initial
> lines that look like email headers and "From: " lines.
> 
> Stop looking for utf-8 at the first signature line.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Joe Perches <joe@perches.com>
> 
> ---
> 
> I don't feel strongly that this patch should be applied,
> that's why it's a --strict check and not on by default,
> but Andrew Morton seems to want something like it...

Mainly because of the non-ascii single-quote chars which gcc emits in
its warning/error messages.  I use LANG=C to stop gcc from doing that,
and also have a proglet to undo this nonsense when I'm merging patches
(below) (I totally forget how it works).  But I see such things turning
up in the tree via other merge paths.



#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>

static void dump(int *buf)
{
	if (buf[0] == 0xE2 && buf[1] == 0x80 && buf[2] == 0x98) {
		putchar('`');
		buf[0] = 0;
		buf[1] = 0;
		buf[2] = 0;
	} else if (buf[0] == 0xE2 && buf[1] == 0x80 && buf[2] == 0x99) {
		putchar('\'');
		buf[0] = 0;
		buf[1] = 0;
		buf[2] = 0;
	} else if (buf[0] == 0xa1) {
		putchar('`');
		goto move;
	} else if (buf[0] == 0xa2) {
		putchar('\'');
		goto move;
	} else {
		if (buf[0])
			putchar(buf[0]);
move:
		buf[0] = buf[1]; 
		buf[1] = buf[2];
		buf[2] = 0;
	}
}

static void doit(FILE *f)
{
	int buf[3] = {};
	int c;

	while ((c = fgetc(f)) != EOF) {
		dump(buf);
		buf[2] = c;
	}
	dump(buf);
	dump(buf);
	dump(buf);
}

int main(int argc, char *argv[])
{
	if (argc == 1) {
		doit(stdin);
	} else {
		int i;

		for (i = 1; i < argc; i++) {
			FILE *f = fopen(argv[i], "r");

			if (f == NULL) {
				fprintf(stderr, "%s: cannot open `%s': %s\n",
					argv[0], argv[1], strerror(errno));
				exit(1);
			}
			doit(f);
			fclose(f);
		}
	}
	exit(0);
}			


      reply	other threads:[~2011-10-10 22:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-10  5:56 [PATCH] checkpatch: warn on found Change-Id lines Olof Johansson
2011-10-10 17:09 ` Joe Perches
2011-10-10 22:10   ` Andrew Morton
2011-10-10 22:32     ` [PATCH] checkpatch: Add a --strict check for utf-8 in commit logs Joe Perches
2011-10-10 22:42       ` Andrew Morton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111010154205.02c3bd3e.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=apw@shadowen.org \
    --cc=joe@perches.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox