From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:52732 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752443AbbINAdH convert rfc822-to-8bit (ORCPT ); Sun, 13 Sep 2015 20:33:07 -0400 Message-ID: <1442190770.2298.31.camel@decadent.org.uk> Subject: Re: [PATCH 2/2] DocBook: Use a fixed encoding for output From: Ben Hutchings Date: Mon, 14 Sep 2015 01:32:50 +0100 In-Reply-To: <20150911133059.77455c16@lwn.net> References: <1441147632.9215.42.camel@decadent.org.uk> <1441147759.9215.44.camel@decadent.org.uk> <20150911133059.77455c16@lwn.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Mime-Version: 1.0 Sender: linux-kbuild-owner@vger.kernel.org List-ID: To: Jonathan Corbet Cc: =?ISO-8859-1?Q?J=E9r=E9my?= Bobbio , reproducible-builds@lists.alioth.debian.org, linux-doc@vger.kernel.org, Randy Dunlap , Michal Marek , linux-kbuild On Fri, 2015-09-11 at 13:30 -0600, Jonathan Corbet wrote: > On Tue, 01 Sep 2015 23:49:19 +0100 > Ben Hutchings wrote: > > > Currently the encoding of documents generated by DocBook depends on > > the current locale. Make the output reproducible independently of > > the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by > > preference, or ASCII (LC_CTYPE=C) as a fallback. > > I guess I have to ask, though: doesn't it seem that having the docs > produced according to the current locale is the Right Thing to do? Users > have their locale set as it is for a reason, it seems like the production > of textual documents should respect their choice. > > Am I missing something here? Yes - the locale's character encoding applies to plain text, but rich text formats can have a locale-independent encoding which the viewer will automatically to the current locale's encoding. For HTML, the document encoding can be explicit in the document header (and is, in this case). Manual pages were already consistently encoded in UTF-8, as this is the default behaviour of DocBook-XSL (and is what man-db prefers as input). PDF and Postscript documents have arbitrary and explicit mappings from character numbers (or names) to glyphs, and PDF documents normally have a mapping from glyphs back to Unicode code points to support searching and copying text. Ben.