* [PATCH] devtool: fix handling of unicode characters from subprocess stdout
@ 2016-11-11 6:02 Jiajie Hu
2016-11-11 12:22 ` Burton, Ross
0 siblings, 1 reply; 5+ messages in thread
From: Jiajie Hu @ 2016-11-11 6:02 UTC (permalink / raw)
To: openembedded-core
In previous implementation, a UnicodeDecodeError exception will be
raised if multi-byte encoded characters are printed by the subprocess.
As an example, the following command will fail in an en_US.UTF-8
environment because wget quotes its saving destination with '‘'(0xE2
0x80 0x98), while just the first byte is provided for decoding:
devtool add recipe http://example.com/source.tar.xz
The patch fixes the issue by avoiding such kind of incomplete decoding.
Signed-off-by: Jiajie Hu <jiajie.hu@intel.com>
---
scripts/lib/devtool/__init__.py | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/scripts/lib/devtool/__init__.py b/scripts/lib/devtool/__init__.py
index e675133..31ecb65 100644
--- a/scripts/lib/devtool/__init__.py
+++ b/scripts/lib/devtool/__init__.py
@@ -23,6 +23,7 @@ import sys
import subprocess
import logging
import re
+import codecs
logger = logging.getLogger('devtool')
@@ -67,10 +68,10 @@ def exec_watch(cmd, **options):
cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, **options
)
+ reader = codecs.getreader('utf-8')(process.stdout)
buf = ''
while True:
- out = process.stdout.read(1)
- out = out.decode('utf-8')
+ out = reader.read(1, 1)
if out:
sys.stdout.write(out)
sys.stdout.flush()
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
2016-11-11 6:02 [PATCH] devtool: fix handling of unicode characters from subprocess stdout Jiajie Hu
@ 2016-11-11 12:22 ` Burton, Ross
2016-11-15 5:44 ` Hu, Jiajie
2016-11-16 17:14 ` Stephano Cetola
0 siblings, 2 replies; 5+ messages in thread
From: Burton, Ross @ 2016-11-11 12:22 UTC (permalink / raw)
To: Jiajie Hu; +Cc: OE-core
[-- Attachment #1: Type: text/plain, Size: 588 bytes --]
On 11 November 2016 at 06:02, Jiajie Hu <jiajie.hu@intel.com> wrote:
> + reader = codecs.getreader('utf-8')(process.stdout)
> buf = ''
> while True:
> - out = process.stdout.read(1)
> - out = out.decode('utf-8')
> + out = reader.read(1, 1)
>
A reader is definitely the right thing here, but I'm wondering why this
needs to loop on single characters. As I understand it doing a read() on a
reader wrapping stdout will read until it blocks (because the process
hasn't got anything to output) so result in less pointless iterating.
Ross
[-- Attachment #2: Type: text/html, Size: 1055 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
2016-11-11 12:22 ` Burton, Ross
@ 2016-11-15 5:44 ` Hu, Jiajie
2016-11-16 17:14 ` Stephano Cetola
1 sibling, 0 replies; 5+ messages in thread
From: Hu, Jiajie @ 2016-11-15 5:44 UTC (permalink / raw)
To: Burton, Ross; +Cc: OE-core
[-- Attachment #1: Type: text/plain, Size: 1119 bytes --]
I tried to follow the original implementation here. It seems that a simple read() is blocking and the output of the subprocess cannot be redirected in time. E.g., ‘Note: Fetching …’ is not shown until the unpack is finished.
From: Burton, Ross [mailto:ross.burton@intel.com]
Sent: Friday, November 11, 2016 8:22 PM
To: Hu, Jiajie <jiajie.hu@intel.com>
Cc: OE-core <openembedded-core@lists.openembedded.org>
Subject: Re: [OE-core] [PATCH] devtool: fix handling of unicode characters from subprocess stdout
On 11 November 2016 at 06:02, Jiajie Hu <jiajie.hu@intel.com<mailto:jiajie.hu@intel.com>> wrote:
+ reader = codecs.getreader('utf-8')(process.stdout)
buf = ''
while True:
- out = process.stdout.read(1)
- out = out.decode('utf-8')
+ out = reader.read(1, 1)
A reader is definitely the right thing here, but I'm wondering why this needs to loop on single characters. As I understand it doing a read() on a reader wrapping stdout will read until it blocks (because the process hasn't got anything to output) so result in less pointless iterating.
Ross
[-- Attachment #2: Type: text/html, Size: 4337 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
2016-11-11 12:22 ` Burton, Ross
2016-11-15 5:44 ` Hu, Jiajie
@ 2016-11-16 17:14 ` Stephano Cetola
2016-11-16 22:21 ` Paul Eggleton
1 sibling, 1 reply; 5+ messages in thread
From: Stephano Cetola @ 2016-11-16 17:14 UTC (permalink / raw)
To: Burton, Ross; +Cc: OE-core
On 11/11, Burton, Ross wrote:
> A reader is definitely the right thing here, but I'm wondering why this
> needs to loop on single characters. As I understand it doing a read() on a
> reader wrapping stdout will read until it blocks (because the process
> hasn't got anything to output) so result in less pointless iterating.
>
> Ross
I've tested this and it fixes the issue, and resolves this bug:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=10649
We may be able to stream this without blocking using
Queue.get_nowait(). This could solve the looping issue as well, as
you'd be looping over stdout.readline and placing that in the queue,
rather than looping over each character.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
2016-11-16 17:14 ` Stephano Cetola
@ 2016-11-16 22:21 ` Paul Eggleton
0 siblings, 0 replies; 5+ messages in thread
From: Paul Eggleton @ 2016-11-16 22:21 UTC (permalink / raw)
To: openembedded-core
On Wed, 16 Nov 2016 09:14:20 Stephano Cetola wrote:
> On 11/11, Burton, Ross wrote:
> > A reader is definitely the right thing here, but I'm wondering why this
> > needs to loop on single characters. As I understand it doing a read() on
> > a
> > reader wrapping stdout will read until it blocks (because the process
> > hasn't got anything to output) so result in less pointless iterating.
>
> I've tested this and it fixes the issue, and resolves this bug:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=10649
>
> We may be able to stream this without blocking using
> Queue.get_nowait(). This could solve the looping issue as well, as
> you'd be looping over stdout.readline and placing that in the queue,
> rather than looping over each character.
FWIW I'd rather see this one merged and the issue fixed and we can optimise it
later, especially as the original version used a single-byte read.
Thanks Jiajie for fixing the issue I caused and Stephano for verifying.
Acked-by: Paul Eggleton <paul.eggleton@linux.intel.com>
Cheers,
Paul
--
Paul Eggleton
Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-11-16 22:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-11 6:02 [PATCH] devtool: fix handling of unicode characters from subprocess stdout Jiajie Hu
2016-11-11 12:22 ` Burton, Ross
2016-11-15 5:44 ` Hu, Jiajie
2016-11-16 17:14 ` Stephano Cetola
2016-11-16 22:21 ` Paul Eggleton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox