Openembedded Bitbake Development
 help / color / mirror / Atom feed
* [PATCH 0/1] process.py, build.py: Fix log truncation problems with flush()
@ 2012-06-14 14:58 Jason Wessel
  2012-06-14 14:58 ` [PATCH 1/1] " Jason Wessel
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Wessel @ 2012-06-14 14:58 UTC (permalink / raw)
  To: bitbake-devel

Separately Richard had asked me if this proposed patch to flush things
out to disk immediately had a negative performance impact.  It is
actually the opposite effect due to a very subtle problem that was
fixed.

My litmus test is to wipe the .o files, drop kernel caches, force
compile the kernel which will generate 100's of very small single line
logs under the size of the buffer limit.

Without the log flushing patch compiling a kernel:
   real    3m33.906s
   user    0m53.523s
   sys     0m24.518s

With the log flushing patch compiling a kernel:
   real    3m28.102s
   user    0m53.251s
   sys     0m24.982s

Why is it faster you ask?

It has to do with the alternating blocking reads on stdin and stderr.
First the existing code blocks on stdout until the buffer fills up in
the python logger.  Once the buffer is full, it will switch to
blocking on stderr, and if nothing ever comes it may wait all the way
until the task completes and then tries to chunk the buffer out to disk
at the end which is quite wasteful.  

Sometimes the existing behavior also leads to having any bitbake
injected error messages into the log file appear out of chronological
order with respect to how ever much data was in the log because bitbake
will emit the error to stderr and stderr could get flushed before
stdout.

Cheers,
Jason.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] process.py, build.py: Fix log truncation problems with flush()
  2012-06-14 14:58 [PATCH 0/1] process.py, build.py: Fix log truncation problems with flush() Jason Wessel
@ 2012-06-14 14:58 ` Jason Wessel
  2012-06-21 15:55   ` Richard Purdie
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Wessel @ 2012-06-14 14:58 UTC (permalink / raw)
  To: bitbake-devel

There are two problems with the _logged_communicate that are both
caused as a result of buffering I/O, instead of flushing it out to the
log files as it arrives.

1) log truncation when python dumps
   I have seen the task logs missing data that was not flushed when
   bitbake crashes.

2) While a bitbake task is running it is impossible to see what is
   going on if it is only writing a small incremental log that is
   smaller than the buffer, or you get only a partial log, up until
   the task exists.  It is worse in the case that stderr and stdout
   are separate file handles, because previous code blocks on the read
   of stdout and then stderr, serially.

The right approach is simply to use select() to determine if there is
data available and then to flush the log buffers as they arrive.  This
still makes use of buffering in the cases where there is more than 1
byte of data, but the buffers are much more dynamic.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
---
 lib/bb/build.py   |    3 ++-
 lib/bb/process.py |   29 +++++++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/lib/bb/build.py b/lib/bb/build.py
index a9ba02d..7e9bb23 100644
--- a/lib/bb/build.py
+++ b/lib/bb/build.py
@@ -135,7 +135,8 @@ class LogTee(object):
 
     def __repr__(self):
         return '<LogTee {0}>'.format(self.name)
-
+    def flush(self):
+        self.outfile.flush()
 
 def exec_func(func, d, dirs = None):
     """Execute an BB 'function'"""
diff --git a/lib/bb/process.py b/lib/bb/process.py
index b74cb18..9848fc3 100644
--- a/lib/bb/process.py
+++ b/lib/bb/process.py
@@ -1,6 +1,9 @@
 import logging
 import signal
 import subprocess
+import fcntl
+import errno
+import select
 
 logger = logging.getLogger('BitBake.Process')
 
@@ -70,18 +73,40 @@ def _logged_communicate(pipe, log, input):
 
     bufsize = 512
     outdata, errdata = [], []
+    rin = []
+
+    if pipe.stdout is not None:
+        fd = pipe.stdout.fileno()
+        fl = fcntl.fcntl(fd, fcntl.F_GETFL)
+        fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
+        rin.append(pipe.stdout)
+    if pipe.stderr is not None:
+        fd = pipe.stderr.fileno()
+        fl = fcntl.fcntl(fd, fcntl.F_GETFL)
+        fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
+        rin.append(pipe.stderr)
+
     while pipe.poll() is None:
-        if pipe.stdout is not None:
+        rlist = rin
+        try:
+            r,w,e = select.select (rlist, [], [])
+        except OSError, e:
+            if e.errno != errno.EINTR:
+                raise
+
+        if pipe.stdout in r:
             data = pipe.stdout.read(bufsize)
             if data is not None:
                 outdata.append(data)
                 log.write(data)
+                log.flush()
 
-        if pipe.stderr is not None:
+        if pipe.stderr in r:
             data = pipe.stderr.read(bufsize)
             if data is not None:
                 errdata.append(data)
                 log.write(data)
+                log.flush()
     return ''.join(outdata), ''.join(errdata)
 
 def run(cmd, input=None, log=None, **options):
-- 
1.7.10




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] process.py, build.py: Fix log truncation problems with flush()
  2012-06-14 14:58 ` [PATCH 1/1] " Jason Wessel
@ 2012-06-21 15:55   ` Richard Purdie
  2012-06-21 16:04     ` Jason Wessel
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Purdie @ 2012-06-21 15:55 UTC (permalink / raw)
  To: Jason Wessel; +Cc: bitbake-devel

On Thu, 2012-06-14 at 09:58 -0500, Jason Wessel wrote:
> There are two problems with the _logged_communicate that are both
> caused as a result of buffering I/O, instead of flushing it out to the
> log files as it arrives.
> 
> 1) log truncation when python dumps
>    I have seen the task logs missing data that was not flushed when
>    bitbake crashes.
> 
> 2) While a bitbake task is running it is impossible to see what is
>    going on if it is only writing a small incremental log that is
>    smaller than the buffer, or you get only a partial log, up until
>    the task exists.  It is worse in the case that stderr and stdout
>    are separate file handles, because previous code blocks on the read
>    of stdout and then stderr, serially.
> 
> The right approach is simply to use select() to determine if there is
> data available and then to flush the log buffers as they arrive.  This
> still makes use of buffering in the cases where there is more than 1
> byte of data, but the buffers are much more dynamic.
> 
> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
> ---
>  lib/bb/build.py   |    3 ++-
>  lib/bb/process.py |   29 +++++++++++++++++++++++++++--
>  2 files changed, 29 insertions(+), 3 deletions(-)

For what its worth I'm seeing a small but consistent increase in real,
sys and user times with this patch which is why I'm pausing to look at
it a little further :(

Cheers,

Richard



> diff --git a/lib/bb/build.py b/lib/bb/build.py
> index a9ba02d..7e9bb23 100644
> --- a/lib/bb/build.py
> +++ b/lib/bb/build.py
> @@ -135,7 +135,8 @@ class LogTee(object):
>  
>      def __repr__(self):
>          return '<LogTee {0}>'.format(self.name)
> -
> +    def flush(self):
> +        self.outfile.flush()
>  
>  def exec_func(func, d, dirs = None):
>      """Execute an BB 'function'"""
> diff --git a/lib/bb/process.py b/lib/bb/process.py
> index b74cb18..9848fc3 100644
> --- a/lib/bb/process.py
> +++ b/lib/bb/process.py
> @@ -1,6 +1,9 @@
>  import logging
>  import signal
>  import subprocess
> +import fcntl
> +import errno
> +import select
>  
>  logger = logging.getLogger('BitBake.Process')
>  
> @@ -70,18 +73,40 @@ def _logged_communicate(pipe, log, input):
>  
>      bufsize = 512
>      outdata, errdata = [], []
> +    rin = []
> +
> +    if pipe.stdout is not None:
> +        fd = pipe.stdout.fileno()
> +        fl = fcntl.fcntl(fd, fcntl.F_GETFL)
> +        fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
> +        rin.append(pipe.stdout)
> +    if pipe.stderr is not None:
> +        fd = pipe.stderr.fileno()
> +        fl = fcntl.fcntl(fd, fcntl.F_GETFL)
> +        fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
> +        rin.append(pipe.stderr)
> +
>      while pipe.poll() is None:
> -        if pipe.stdout is not None:
> +        rlist = rin
> +        try:
> +            r,w,e = select.select (rlist, [], [])
> +        except OSError, e:
> +            if e.errno != errno.EINTR:
> +                raise
> +
> +        if pipe.stdout in r:
>              data = pipe.stdout.read(bufsize)
>              if data is not None:
>                  outdata.append(data)
>                  log.write(data)
> +                log.flush()
>  
> -        if pipe.stderr is not None:
> +        if pipe.stderr in r:
>              data = pipe.stderr.read(bufsize)
>              if data is not None:
>                  errdata.append(data)
>                  log.write(data)
> +                log.flush()
>      return ''.join(outdata), ''.join(errdata)
>  
>  def run(cmd, input=None, log=None, **options):





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] process.py, build.py: Fix log truncation problems with flush()
  2012-06-21 15:55   ` Richard Purdie
@ 2012-06-21 16:04     ` Jason Wessel
  2012-06-21 16:12       ` Richard Purdie
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Wessel @ 2012-06-21 16:04 UTC (permalink / raw)
  To: Richard Purdie; +Cc: bitbake-devel

On 06/21/2012 10:55 AM, Richard Purdie wrote:
> On Thu, 2012-06-14 at 09:58 -0500, Jason Wessel wrote:
>> There are two problems with the _logged_communicate that are both
>> caused as a result of buffering I/O, instead of flushing it out to the
>> log files as it arrives.
>>
>> 1) log truncation when python dumps
>>    I have seen the task logs missing data that was not flushed when
>>    bitbake crashes.
>>
>> 2) While a bitbake task is running it is impossible to see what is
>>    going on if it is only writing a small incremental log that is
>>    smaller than the buffer, or you get only a partial log, up until
>>    the task exists.  It is worse in the case that stderr and stdout
>>    are separate file handles, because previous code blocks on the read
>>    of stdout and then stderr, serially.
>>
>> The right approach is simply to use select() to determine if there is
>> data available and then to flush the log buffers as they arrive.  This
>> still makes use of buffering in the cases where there is more than 1
>> byte of data, but the buffers are much more dynamic.
>>
>> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
>> ---
>>  lib/bb/build.py   |    3 ++-
>>  lib/bb/process.py |   29 +++++++++++++++++++++++++++--
>>  2 files changed, 29 insertions(+), 3 deletions(-)
> For what its worth I'm seeing a small but consistent increase in real,
> sys and user times with this patch which is why I'm pausing to look at
> it a little further :(


In my experience, it varied quite a bit.  If need be, we simply allow allow a config variable like BB_FORCE_LOG_FLUSH = "1", and get rid of the flushes by default.

At the distro level, I'll turn this on, but still allow users to override it, because timely logging of builds is critical, and loss of data due to a python crash is completely unacceptable, it just makes build failures even harder to diagnose.

Jason.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] process.py, build.py: Fix log truncation problems with flush()
  2012-06-21 16:04     ` Jason Wessel
@ 2012-06-21 16:12       ` Richard Purdie
  2012-06-21 16:26         ` Jason Wessel
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Purdie @ 2012-06-21 16:12 UTC (permalink / raw)
  To: Jason Wessel; +Cc: bitbake-devel

On Thu, 2012-06-21 at 11:04 -0500, Jason Wessel wrote:
> On 06/21/2012 10:55 AM, Richard Purdie wrote:
> > On Thu, 2012-06-14 at 09:58 -0500, Jason Wessel wrote:
> >> There are two problems with the _logged_communicate that are both
> >> caused as a result of buffering I/O, instead of flushing it out to the
> >> log files as it arrives.
> >>
> >> 1) log truncation when python dumps
> >>    I have seen the task logs missing data that was not flushed when
> >>    bitbake crashes.
> >>
> >> 2) While a bitbake task is running it is impossible to see what is
> >>    going on if it is only writing a small incremental log that is
> >>    smaller than the buffer, or you get only a partial log, up until
> >>    the task exists.  It is worse in the case that stderr and stdout
> >>    are separate file handles, because previous code blocks on the read
> >>    of stdout and then stderr, serially.
> >>
> >> The right approach is simply to use select() to determine if there is
> >> data available and then to flush the log buffers as they arrive.  This
> >> still makes use of buffering in the cases where there is more than 1
> >> byte of data, but the buffers are much more dynamic.
> >>
> >> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
> >> ---
> >>  lib/bb/build.py   |    3 ++-
> >>  lib/bb/process.py |   29 +++++++++++++++++++++++++++--
> >>  2 files changed, 29 insertions(+), 3 deletions(-)
> > For what its worth I'm seeing a small but consistent increase in real,
> > sys and user times with this patch which is why I'm pausing to look at
> > it a little further :(
> 
> 
> In my experience, it varied quite a bit.  If need be, we simply allow
> allow a config variable like BB_FORCE_LOG_FLUSH = "1", and get rid of
> the flushes by default.
> 
> At the distro level, I'll turn this on, but still allow users to
> override it, because timely logging of builds is critical, and loss of
> data due to a python crash is completely unacceptable, it just makes
> build failures even harder to diagnose.

Python has some better tools to handle things like this, such as
something like:

try:
    <normal case>
finally:
    x.flush()

so it would be interesting to see if that helps.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] process.py, build.py: Fix log truncation problems with flush()
  2012-06-21 16:12       ` Richard Purdie
@ 2012-06-21 16:26         ` Jason Wessel
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Wessel @ 2012-06-21 16:26 UTC (permalink / raw)
  To: Richard Purdie; +Cc: bitbake-devel

On 06/21/2012 11:12 AM, Richard Purdie wrote:
> On Thu, 2012-06-21 at 11:04 -0500, Jason Wessel wrote:
>> On 06/21/2012 10:55 AM, Richard Purdie wrote:
>>> On Thu, 2012-06-14 at 09:58 -0500, Jason Wessel wrote:
>>>> There are two problems with the _logged_communicate that are both
>>>> caused as a result of buffering I/O, instead of flushing it out to the
>>>> log files as it arrives.
>>>>
>>>> 1) log truncation when python dumps
>>>>    I have seen the task logs missing data that was not flushed when
>>>>    bitbake crashes.
>>>>
>>>> 2) While a bitbake task is running it is impossible to see what is
>>>>    going on if it is only writing a small incremental log that is
>>>>    smaller than the buffer, or you get only a partial log, up until
>>>>    the task exists.  It is worse in the case that stderr and stdout
>>>>    are separate file handles, because previous code blocks on the read
>>>>    of stdout and then stderr, serially.
>>>>
>>>> The right approach is simply to use select() to determine if there is
>>>> data available and then to flush the log buffers as they arrive.  This
>>>> still makes use of buffering in the cases where there is more than 1
>>>> byte of data, but the buffers are much more dynamic.
>>>>
>>>> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
>>>> ---
>>>>  lib/bb/build.py   |    3 ++-
>>>>  lib/bb/process.py |   29 +++++++++++++++++++++++++++--
>>>>  2 files changed, 29 insertions(+), 3 deletions(-)
>>> For what its worth I'm seeing a small but consistent increase in real,
>>> sys and user times with this patch which is why I'm pausing to look at
>>> it a little further :(
>>
>> In my experience, it varied quite a bit.  If need be, we simply allow
>> allow a config variable like BB_FORCE_LOG_FLUSH = "1", and get rid of
>> the flushes by default.
>>
>> At the distro level, I'll turn this on, but still allow users to
>> override it, because timely logging of builds is critical, and loss of
>> data due to a python crash is completely unacceptable, it just makes
>> build failures even harder to diagnose.
> Python has some better tools to handle things like this, such as
> something like:
>
> try:
>     <normal case>
> finally:
>     x.flush()
>
>

There is a secondary case you have to consider.   There is the case where I want the data "now" like in the real time logging, or tailing of log files.

As for your example above, the core log framework needs some kind of major change in this case.  When the task engine crashes for example.  The data in the buffers is gone with the crash.  Perhaps we are talking about two different things however.

Jason.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-06-21 16:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-14 14:58 [PATCH 0/1] process.py, build.py: Fix log truncation problems with flush() Jason Wessel
2012-06-14 14:58 ` [PATCH 1/1] " Jason Wessel
2012-06-21 15:55   ` Richard Purdie
2012-06-21 16:04     ` Jason Wessel
2012-06-21 16:12       ` Richard Purdie
2012-06-21 16:26         ` Jason Wessel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox