All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: Randy Dunlap <rdunlap@infradead.org>
Cc: "Matthew Wilcox" <willy@infradead.org>,
	"Markus Heiser" <markus.heiser@darmarit.de>,
	"Michal Suchánek" <msuchanek@suse.de>,
	linux-doc@vger.kernel.org, "Jonathan Corbet" <corbet@lwn.net>
Subject: Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
Date: Fri, 7 May 2021 08:39:24 +0200	[thread overview]
Message-ID: <20210507083924.7b8ec1fe@coco.lan> (raw)
In-Reply-To: <be21de46-6655-152e-e431-144c2be6137c@infradead.org>

Em Thu, 6 May 2021 14:21:01 -0700
Randy Dunlap <rdunlap@infradead.org> escreveu:

> On 5/6/21 11:08 AM, Matthew Wilcox wrote:
> > On Thu, May 06, 2021 at 10:57:53AM -0700, Randy Dunlap wrote:  
> >> I have been going thru some of the Documentation/ files...
> >>
> >> Why do several of the files begin with
> >> (hex) ef bb bf    followed by "=================="
> >> for a heading, instead of just "===================".
> >> See e.g. Documentation/timers/no_hz.rst.  

No idea! It seems that the text editor I used on that time added
it for whatever reason.

> > 
> > 00000000  ef bb bf 3d 3d 3d 3d 3d  3d 3d 3d 3d 3d 3d 3d 3d  |...=============|
> > 
> > ef bb bf is utf8 for 0b1111'111011'111111 = 0xFEFF which is the
> > https://en.wikipedia.org/wiki/Byte_order_mark
> > 
> > We should delete it.
> >   
> 
> OK, thanks, I have started on that.
> 
> 
> Just another question: ("inquiring minds want to know")
> 
> Why is/are some docs using U+2217 '*' instead of ASCII '*'?
> E.g., Documentation/block/cdrom-standard.rst.

The cdrom doc is a very special case: it was originally written in LaTeX.
I don't remember any other document in LaTeX inside the Kernel docs during
the conversions I made. See:
	e327cfcb2542 ("docs: cdrom-standard.tex: convert from LaTeX to ReST")

In order to convert it to .rst, I used some tool to first turn it
into plain text (probably LaTeX, but I don't remember anymore), and then
I manually reviewed the entire file, adding ReST tags where needed.

I didn't realize that utf-8 chars were used instead of normal ASCII chars,
as both appear the same when editing it[1].

[1] I use Fedora here. Fedora changed the default charset to utf-8 a long
    time ago.

Anyway, we should be able of get rid of weird UTF-8 chars from it with:

	$ iconv -f utf-8 -t ascii//TRANSLIT Documentation/cdrom/cdrom-standard.rst

I'll prepare a patch fixing it. Some care should be taken, however, as
it has two places where UTF-8 chars should be used[2].

[2] There are two German person names that use UTF-8 chars:
    - 'o' + umlat;
    - a LATIN SMALL LETTER SHARP S (Eszett)

Thanks,
Mauro

  reply	other threads:[~2021-05-07  6:39 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:39 Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256) Michal Suchánek
2021-05-06 11:20 ` Mauro Carvalho Chehab
2021-05-06 13:32   ` Michal Suchánek
2021-05-06 14:24     ` Mauro Carvalho Chehab
2021-05-06 14:35       ` Michal Suchánek
2021-05-06 15:57 ` Markus Heiser
2021-05-06 16:46   ` Mauro Carvalho Chehab
2021-05-06 17:04     ` Markus Heiser
2021-05-06 17:27       ` Mauro Carvalho Chehab
2021-05-06 17:53         ` Markus Heiser
2021-05-06 18:06           ` Michal Suchánek
2021-05-07  8:52             ` Mauro Carvalho Chehab
2021-05-06 17:57         ` Randy Dunlap
2021-05-06 18:08           ` Matthew Wilcox
2021-05-06 21:21             ` Randy Dunlap
2021-05-07  6:39               ` Mauro Carvalho Chehab [this message]
2021-05-07  6:49                 ` Randy Dunlap
2021-05-07  8:04                 ` Mauro Carvalho Chehab
2021-05-07  8:35                   ` Michal Suchánek
2021-05-07  8:56                     ` Markus Heiser
2021-05-07  9:14                       ` Mauro Carvalho Chehab
2021-05-07  9:51                         ` Markus Heiser
2021-05-07 10:29                           ` Michal Suchánek
2021-05-07  9:02                     ` Mauro Carvalho Chehab
2021-05-08  9:22                 ` Mauro Carvalho Chehab
2021-05-08 10:41                   ` Michal Suchánek
2021-05-08 14:41                     ` Mauro Carvalho Chehab
2021-05-08 15:55                       ` Randy Dunlap
2021-05-08 17:09                         ` Michal Suchánek
2021-05-08 17:46                           ` Randy Dunlap
2021-05-10  6:22                             ` Mauro Carvalho Chehab
2021-05-10  8:17                         ` Mauro Carvalho Chehab
2021-05-06 17:48       ` Michal Suchánek
2021-05-06 17:59         ` Markus Heiser
2021-05-06 18:16           ` Michal Suchánek
2021-05-12  6:22         ` Mauro Carvalho Chehab
2021-05-12  7:01           ` Michal Suchánek
2021-05-12  7:18             ` Markus Heiser
2021-05-12  7:37               ` Markus Heiser
2021-05-12  7:59             ` Mauro Carvalho Chehab
2021-05-17 13:10               ` Michal Suchánek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507083924.7b8ec1fe@coco.lan \
    --to=mchehab@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=markus.heiser@darmarit.de \
    --cc=msuchanek@suse.de \
    --cc=rdunlap@infradead.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.