From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kbuild-owner@vger.kernel.org>
Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:52732 "EHLO
	shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1752443AbbINAdH convert rfc822-to-8bit
	(ORCPT <rfc822;linux-kbuild@vger.kernel.org>);
	Sun, 13 Sep 2015 20:33:07 -0400
Message-ID: <1442190770.2298.31.camel@decadent.org.uk>
Subject: Re: [PATCH 2/2] DocBook: Use a fixed encoding for output
From: Ben Hutchings <ben@decadent.org.uk>
Date: Mon, 14 Sep 2015 01:32:50 +0100
In-Reply-To: <20150911133059.77455c16@lwn.net>
References: <1441147632.9215.42.camel@decadent.org.uk>
	 <1441147759.9215.44.camel@decadent.org.uk>
	 <20150911133059.77455c16@lwn.net>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8BIT
Mime-Version: 1.0
Sender: linux-kbuild-owner@vger.kernel.org
List-ID: <linux-kbuild.vger.kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: =?ISO-8859-1?Q?J=E9r=E9my?= Bobbio <lunar@debian.org>, reproducible-builds@lists.alioth.debian.org, linux-doc@vger.kernel.org, Randy Dunlap <rdunlap@infradead.org>, Michal Marek <mmarek@suse.com>, linux-kbuild <linux-kbuild@vger.kernel.org>

On Fri, 2015-09-11 at 13:30 -0600, Jonathan Corbet wrote:
> On Tue, 01 Sep 2015 23:49:19 +0100
> Ben Hutchings <ben@decadent.org.uk> wrote:
> 
> > Currently the encoding of documents generated by DocBook depends on
> > the current locale.  Make the output reproducible independently of
> > the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by
> > preference, or ASCII (LC_CTYPE=C) as a fallback.
> 
> I guess I have to ask, though: doesn't it seem that having the docs
> produced according to the current locale is the Right Thing to do?  Users
> have their locale set as it is for a reason, it seems like the production
> of textual documents should respect their choice.
> 
> Am I missing something here?

Yes - the locale's character encoding applies to plain text, but rich
text formats can have a locale-independent encoding which the viewer
will automatically to the current locale's encoding.

For HTML, the document encoding can be explicit in the document header
(and is, in this case).

Manual pages were already consistently encoded in UTF-8, as this is the
default behaviour of DocBook-XSL (and is what man-db prefers as input).

PDF and Postscript documents have arbitrary and explicit mappings from
character numbers (or names) to glyphs, and PDF documents normally have
a mapping from glyphs back to Unicode code points to support searching
and copying text.

Ben.