All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: "Michal Suchánek" <msuchanek@suse.de>
Cc: Markus Heiser <markus.heiser@darmarit.de>,
	linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>
Subject: Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
Date: Fri, 7 May 2021 10:52:15 +0200	[thread overview]
Message-ID: <20210507105215.0902461d@coco.lan> (raw)
In-Reply-To: <20210506180625.GI6564@kitsune.suse.cz>

Em Thu, 6 May 2021 20:06:25 +0200
Michal Suchánek <msuchanek@suse.de> escreveu:

> On Thu, May 06, 2021 at 07:53:25PM +0200, Markus Heiser wrote:

> > Hi Mauro,
> > 
> > it is not comfortable but is it mad? ..
> > 
> > Most often languages (or applications) do not handle encoding
> > of strings they just piping a binary stream while python
> > decode / encodes strings.
> > 
> > "The Zen of Python" [1] says
> > 
> >    Explicit is better than implicit.

This was taken into an extreme with regards to charsets:

	 "better" should never be translated to "crash" ;-)

> > If a stream can't encode symbols and these symbols should be ignored
> > you have to set the encoding of the stream explicit to ignore
> > such symbols.  
> 
> The problem is this part never happened. Loggers are supposed to tell
> you about the error in your application, not crash it.

It is insane to crash the error log due to a charset issue ;-)

> But the problem with Sphinx may be that the output file is also assumed
> to be in the locale encoding, and the output encoding is never set. It's
> HTML so it could be encoded with entities, too.
> 
> The idea about handlinng encoding precisely is not mad in itself but then
> everybody working with just ASCII and never testing their software works
> in the cases where explicit handling is needed is the mad part. 

True. The machine's locale shouldn't affect *at all* the produced
documents. See, there's a hole set of non-latin family of charsets
supported on Linux:

	https://man7.org/linux/man-pages/man7/charsets.7.html

Nothing prevents that someone using a machine whose default encoding is
KOI8-R/BIG-5/GB 2312/JIS X 0208/... to use Sphinx to produce 
UTF-8 [1] documents.

[1] or whatever other output encoding

Ok, the logger may not be able to correctly display certain
chars, but it it be perfectly fine and sane to use //TRANSLIT (or
something similar) in order to do a charset conversion. 

Even to just print a <?> for all chars that aren't printable at
the logger's output using the charset set by LANG/LC_* is 
better/saner than crashing.

Thanks,
Mauro

  reply	other threads:[~2021-05-07  8:52 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:39 Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256) Michal Suchánek
2021-05-06 11:20 ` Mauro Carvalho Chehab
2021-05-06 13:32   ` Michal Suchánek
2021-05-06 14:24     ` Mauro Carvalho Chehab
2021-05-06 14:35       ` Michal Suchánek
2021-05-06 15:57 ` Markus Heiser
2021-05-06 16:46   ` Mauro Carvalho Chehab
2021-05-06 17:04     ` Markus Heiser
2021-05-06 17:27       ` Mauro Carvalho Chehab
2021-05-06 17:53         ` Markus Heiser
2021-05-06 18:06           ` Michal Suchánek
2021-05-07  8:52             ` Mauro Carvalho Chehab [this message]
2021-05-06 17:57         ` Randy Dunlap
2021-05-06 18:08           ` Matthew Wilcox
2021-05-06 21:21             ` Randy Dunlap
2021-05-07  6:39               ` Mauro Carvalho Chehab
2021-05-07  6:49                 ` Randy Dunlap
2021-05-07  8:04                 ` Mauro Carvalho Chehab
2021-05-07  8:35                   ` Michal Suchánek
2021-05-07  8:56                     ` Markus Heiser
2021-05-07  9:14                       ` Mauro Carvalho Chehab
2021-05-07  9:51                         ` Markus Heiser
2021-05-07 10:29                           ` Michal Suchánek
2021-05-07  9:02                     ` Mauro Carvalho Chehab
2021-05-08  9:22                 ` Mauro Carvalho Chehab
2021-05-08 10:41                   ` Michal Suchánek
2021-05-08 14:41                     ` Mauro Carvalho Chehab
2021-05-08 15:55                       ` Randy Dunlap
2021-05-08 17:09                         ` Michal Suchánek
2021-05-08 17:46                           ` Randy Dunlap
2021-05-10  6:22                             ` Mauro Carvalho Chehab
2021-05-10  8:17                         ` Mauro Carvalho Chehab
2021-05-06 17:48       ` Michal Suchánek
2021-05-06 17:59         ` Markus Heiser
2021-05-06 18:16           ` Michal Suchánek
2021-05-12  6:22         ` Mauro Carvalho Chehab
2021-05-12  7:01           ` Michal Suchánek
2021-05-12  7:18             ` Markus Heiser
2021-05-12  7:37               ` Markus Heiser
2021-05-12  7:59             ` Mauro Carvalho Chehab
2021-05-17 13:10               ` Michal Suchánek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507105215.0902461d@coco.lan \
    --to=mchehab@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=markus.heiser@darmarit.de \
    --cc=msuchanek@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.