[PATCH] devtool: fix handling of unicode characters from subprocess stdout

public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed

* [PATCH] devtool: fix handling of unicode characters from subprocess stdout
@ 2016-11-11  6:02 Jiajie Hu
  2016-11-11 12:22 ` Burton, Ross
  0 siblings, 1 reply; 5+ messages in thread
From: Jiajie Hu @ 2016-11-11  6:02 UTC (permalink / raw)
  To: openembedded-core

In previous implementation, a UnicodeDecodeError exception will be
raised if multi-byte encoded characters are printed by the subprocess.
As an example, the following command will fail in an en_US.UTF-8
environment because wget quotes its saving destination with '‘'(0xE2
0x80 0x98), while just the first byte is provided for decoding:

    devtool add recipe http://example.com/source.tar.xz

The patch fixes the issue by avoiding such kind of incomplete decoding.

Signed-off-by: Jiajie Hu <jiajie.hu@intel.com>
---
 scripts/lib/devtool/__init__.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/scripts/lib/devtool/__init__.py b/scripts/lib/devtool/__init__.py
index e675133..31ecb65 100644
--- a/scripts/lib/devtool/__init__.py
+++ b/scripts/lib/devtool/__init__.py
@@ -23,6 +23,7 @@ import sys
 import subprocess
 import logging
 import re
+import codecs
 
 logger = logging.getLogger('devtool')
 
@@ -67,10 +68,10 @@ def exec_watch(cmd, **options):
         cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, **options
     )
 
+    reader = codecs.getreader('utf-8')(process.stdout)
     buf = ''
     while True:
-        out = process.stdout.read(1)
-        out = out.decode('utf-8')
+        out = reader.read(1, 1)
         if out:
             sys.stdout.write(out)
             sys.stdout.flush()
-- 
1.9.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
  2016-11-11  6:02 [PATCH] devtool: fix handling of unicode characters from subprocess stdout Jiajie Hu
@ 2016-11-11 12:22 ` Burton, Ross
  2016-11-15  5:44   ` Hu, Jiajie
  2016-11-16 17:14   ` Stephano Cetola
  0 siblings, 2 replies; 5+ messages in thread
From: Burton, Ross @ 2016-11-11 12:22 UTC (permalink / raw)
  To: Jiajie Hu; +Cc: OE-core

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

On 11 November 2016 at 06:02, Jiajie Hu <jiajie.hu@intel.com> wrote:

> +    reader = codecs.getreader('utf-8')(process.stdout)
>      buf = ''
>      while True:
> -        out = process.stdout.read(1)
> -        out = out.decode('utf-8')
> +        out = reader.read(1, 1)
>

A reader is definitely the right thing here, but I'm wondering why this
needs to loop on single characters.  As I understand it doing a read() on a
reader wrapping stdout will read until it blocks (because the process
hasn't got anything to output) so result in less pointless iterating.

Ross

[-- Attachment #2: Type: text/html, Size: 1055 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
  2016-11-11 12:22 ` Burton, Ross
@ 2016-11-15  5:44   ` Hu, Jiajie
  2016-11-16 17:14   ` Stephano Cetola
  1 sibling, 0 replies; 5+ messages in thread
From: Hu, Jiajie @ 2016-11-15  5:44 UTC (permalink / raw)
  To: Burton, Ross; +Cc: OE-core

[-- Attachment #1: Type: text/plain, Size: 1119 bytes --]

I tried to follow the original implementation here. It seems that a simple read() is blocking and the output of the subprocess cannot be redirected in time. E.g., ‘Note: Fetching …’ is not shown until the unpack is finished.

From: Burton, Ross [mailto:ross.burton@intel.com]
Sent: Friday, November 11, 2016 8:22 PM
To: Hu, Jiajie <jiajie.hu@intel.com>
Cc: OE-core <openembedded-core@lists.openembedded.org>
Subject: Re: [OE-core] [PATCH] devtool: fix handling of unicode characters from subprocess stdout

On 11 November 2016 at 06:02, Jiajie Hu <jiajie.hu@intel.com<mailto:jiajie.hu@intel.com>> wrote:
+    reader = codecs.getreader('utf-8')(process.stdout)
     buf = ''
     while True:
-        out = process.stdout.read(1)
-        out = out.decode('utf-8')
+        out = reader.read(1, 1)

A reader is definitely the right thing here, but I'm wondering why this needs to loop on single characters.  As I understand it doing a read() on a reader wrapping stdout will read until it blocks (because the process hasn't got anything to output) so result in less pointless iterating.

Ross

[-- Attachment #2: Type: text/html, Size: 4337 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
  2016-11-11 12:22 ` Burton, Ross
  2016-11-15  5:44   ` Hu, Jiajie
@ 2016-11-16 17:14   ` Stephano Cetola
  2016-11-16 22:21     ` Paul Eggleton
  1 sibling, 1 reply; 5+ messages in thread
From: Stephano Cetola @ 2016-11-16 17:14 UTC (permalink / raw)
  To: Burton, Ross; +Cc: OE-core

On 11/11, Burton, Ross wrote:
> A reader is definitely the right thing here, but I'm wondering why this
> needs to loop on single characters.  As I understand it doing a read() on a
> reader wrapping stdout will read until it blocks (because the process
> hasn't got anything to output) so result in less pointless iterating.
> 
> Ross

I've tested this and it fixes the issue, and resolves this bug:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=10649

We may be able to stream this without blocking using
Queue.get_nowait(). This could solve the looping issue as well, as
you'd be looping over stdout.readline and placing that in the queue,
rather than looping over each character.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] devtool: fix handling of unicode characters from subprocess stdout
  2016-11-16 17:14   ` Stephano Cetola
@ 2016-11-16 22:21     ` Paul Eggleton
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Eggleton @ 2016-11-16 22:21 UTC (permalink / raw)
  To: openembedded-core

On Wed, 16 Nov 2016 09:14:20 Stephano Cetola wrote:
> On 11/11, Burton, Ross wrote:
> > A reader is definitely the right thing here, but I'm wondering why this
> > needs to loop on single characters.  As I understand it doing a read() on
> > a
> > reader wrapping stdout will read until it blocks (because the process
> > hasn't got anything to output) so result in less pointless iterating.
> 
> I've tested this and it fixes the issue, and resolves this bug:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=10649
> 
> We may be able to stream this without blocking using
> Queue.get_nowait(). This could solve the looping issue as well, as
> you'd be looping over stdout.readline and placing that in the queue,
> rather than looping over each character.

FWIW I'd rather see this one merged and the issue fixed and we can optimise it 
later, especially as the original version used a single-byte read.

Thanks Jiajie for fixing the issue I caused and Stephano for verifying.

Acked-by: Paul Eggleton <paul.eggleton@linux.intel.com>

Cheers,
Paul

-- 

Paul Eggleton
Intel Open Source Technology Centre


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-11-16 22:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-11  6:02 [PATCH] devtool: fix handling of unicode characters from subprocess stdout Jiajie Hu
2016-11-11 12:22 ` Burton, Ross
2016-11-15  5:44   ` Hu, Jiajie
2016-11-16 17:14   ` Stephano Cetola
2016-11-16 22:21     ` Paul Eggleton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox