git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Work around performance bug in subprocess.Popen.communicate()
@ 2009-07-31  9:36 Karl Wiberg
  2009-07-31 11:27 ` Catalin Marinas
  0 siblings, 1 reply; 7+ messages in thread
From: Karl Wiberg @ 2009-07-31  9:36 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: mandolaerik, git

From: Karl Wiberg <kha@treskal.com>

In Python 2.4 (specifically, I tested with 2.4.6 on Ubuntu 9.04),
subprocess.Popen.communicate() seems to take time proportional to the
square of the size of the indata, which makes it ridiculously
expensive to write stack log entries when the diffs are large. Work
around the bug by calling subprocess.Popen.stdin.write() manually
instead of letting communicate() handle the indata.

The performance bug has been fixed in Python 2.6 (I tested with
2.6.2), so with that version this workaround doesn't affect the run
time. I haven't tested with Python 2.5.

This fixes bug 13319.

Signed-off-by: Karl Wiberg <kha@treskal.com>

---

 stgit/run.py |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)


diff --git a/stgit/run.py b/stgit/run.py
index 7493ed3..311d12f 100644
--- a/stgit/run.py
+++ b/stgit/run.py
@@ -110,7 +110,9 @@ class Run:
                                  stdin = subprocess.PIPE,
                                  stdout = subprocess.PIPE,
                                  stderr = subprocess.PIPE)
-            outdata, errdata = p.communicate(self.__indata)
+            if self.__indata:
+                p.stdin.write(self.__indata)
+            outdata, errdata = p.communicate()
             self.exitcode = p.returncode
         except OSError, e:
             raise self.exc('%s failed: %s' % (self.__cmd[0], e))

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Work around performance bug in subprocess.Popen.communicate()
  2009-07-31  9:36 [PATCH] Work around performance bug in subprocess.Popen.communicate() Karl Wiberg
@ 2009-07-31 11:27 ` Catalin Marinas
  2009-08-04  8:51   ` Karl Wiberg
  0 siblings, 1 reply; 7+ messages in thread
From: Catalin Marinas @ 2009-07-31 11:27 UTC (permalink / raw)
  To: Karl Wiberg; +Cc: mandolaerik, git

2009/7/31 Karl Wiberg <kha@virtutech.com>:
> @@ -110,7 +110,9 @@ class Run:
>                                  stdin = subprocess.PIPE,
>                                  stdout = subprocess.PIPE,
>                                  stderr = subprocess.PIPE)
> -            outdata, errdata = p.communicate(self.__indata)
> +            if self.__indata:
> +                p.stdin.write(self.__indata)
> +            outdata, errdata = p.communicate()
>             self.exitcode = p.returncode
>         except OSError, e:
>             raise self.exc('%s failed: %s' % (self.__cmd[0], e))

But can this not lead to a deadlock if the __indata is big? The stdout
of the created process is only processed once the whole __indata is
written. I thought communicate() was created to avoid this issue.

-- 
Catalin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Work around performance bug in subprocess.Popen.communicate()
  2009-07-31 11:27 ` Catalin Marinas
@ 2009-08-04  8:51   ` Karl Wiberg
  2009-08-13 22:18     ` Catalin Marinas
  0 siblings, 1 reply; 7+ messages in thread
From: Karl Wiberg @ 2009-08-04  8:51 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: mandolaerik, git

On 2009-07-31 12:27:53 +0100, Catalin Marinas wrote:

> But can this not lead to a deadlock if the __indata is big? The
> stdout of the created process is only processed once the whole
> __indata is written. I thought communicate() was created to avoid
> this issue.

I don't think there's a problem. write() isn't supposed to have a
limit on the amount of data it will accept in one call, as far as I'm
aware. Plus, it works just fine with Erik's test case---which in my
case was about 7 MB. If it can handle 7 MB, I doubt there's a limit
we'll hit anytime soon.

Oh, and we still call communicate()---we just don't pass it any
additional bytes to write to stdin.

-- 
Karl Wiberg, Virtutech

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Work around performance bug in subprocess.Popen.communicate()
  2009-08-04  8:51   ` Karl Wiberg
@ 2009-08-13 22:18     ` Catalin Marinas
  2009-08-14  6:21       ` Karl Wiberg
  0 siblings, 1 reply; 7+ messages in thread
From: Catalin Marinas @ 2009-08-13 22:18 UTC (permalink / raw)
  To: Karl Wiberg; +Cc: mandolaerik, git

2009/8/4 Karl Wiberg <kha@virtutech.com>:
> On 2009-07-31 12:27:53 +0100, Catalin Marinas wrote:
>
>> But can this not lead to a deadlock if the __indata is big? The
>> stdout of the created process is only processed once the whole
>> __indata is written. I thought communicate() was created to avoid
>> this issue.
>
> I don't think there's a problem. write() isn't supposed to have a
> limit on the amount of data it will accept in one call, as far as I'm
> aware. Plus, it works just fine with Erik's test case---which in my
> case was about 7 MB. If it can handle 7 MB, I doubt there's a limit
> we'll hit anytime soon.

write() itself doesn't have a limit, it's mainly what the application
receiving the data can handle. In the Git case, I think it takes all
the input as it isn't a filtering tool (things may be different with
tools like sed etc.).

> Oh, and we still call communicate()---we just don't pass it any
> additional bytes to write to stdin.

Yes, but if write() is blocked, communicate() won't be called.

Since we are only using Git, I'll merge this patch (and maybe add a comment).

-- 
Catalin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Work around performance bug in subprocess.Popen.communicate()
  2009-08-13 22:18     ` Catalin Marinas
@ 2009-08-14  6:21       ` Karl Wiberg
  2009-08-14  7:26         ` Erik Sandberg
  0 siblings, 1 reply; 7+ messages in thread
From: Karl Wiberg @ 2009-08-14  6:21 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: mandolaerik, git

On 2009-08-13 23:18:59 +0100, Catalin Marinas wrote:

> 2009/8/4 Karl Wiberg <kha@virtutech.com>:
>
> > On 2009-07-31 12:27:53 +0100, Catalin Marinas wrote:
> >
> > > But can this not lead to a deadlock if the __indata is big? The
> > > stdout of the created process is only processed once the whole
> > > __indata is written. I thought communicate() was created to
> > > avoid this issue.
> >
> > I don't think there's a problem. write() isn't supposed to have a
> > limit on the amount of data it will accept in one call, as far as
> > I'm aware. Plus, it works just fine with Erik's test case---which
> > in my case was about 7 MB. If it can handle 7 MB, I doubt there's
> > a limit we'll hit anytime soon.
>
> write() itself doesn't have a limit, it's mainly what the
> application receiving the data can handle. In the Git case, I think
> it takes all the input as it isn't a filtering tool (things may be
> different with tools like sed etc.).
>
> > Oh, and we still call communicate()---we just don't pass it any
> > additional bytes to write to stdin.
>
> Yes, but if write() is blocked, communicate() won't be called.

Ah, so that's what you were worrying about. Yeah, if the process we're
writing to was going to block us (== not read input) for long periods
of time, we'd have to handle that case. And if it was going to decide
to stop reading input half-way through and give us a sigpipe, we'd
have to handle that. But ...

> Since we are only using Git, I'll merge this patch

Exactly. We really don't expect git to do any of those things to us.
We know it reads all available input before producing the output.
(Well, almost always. I have patches in my experimental branch that
use git cat-file --batch and git diff-tree --stdin, where there can be
many rounds of talking back and forth, but when driving those I'm
being very careful to not deadlock.)

> (and maybe add a comment).

Sure.

-- 
Karl Wiberg, Virtutech

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Work around performance bug in subprocess.Popen.communicate()
  2009-08-14  6:21       ` Karl Wiberg
@ 2009-08-14  7:26         ` Erik Sandberg
  2009-08-14  8:12           ` Karl Wiberg
  0 siblings, 1 reply; 7+ messages in thread
From: Erik Sandberg @ 2009-08-14  7:26 UTC (permalink / raw)
  To: Karl Wiberg; +Cc: Catalin Marinas, git

On Fri, Aug 14, 2009 at 8:21 AM, Karl Wiberg<kha@virtutech.com> wrote:
> On 2009-08-13 23:18:59 +0100, Catalin Marinas wrote:
>
>> Since we are only using Git, I'll merge this patch
>> (and maybe add a comment).
>
> Sure.

Wouldn't it be better to only use the write() workaround when
sys.version indicates that the Python bug may be present, and use
communicate() properly otherwise?

Erik

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Work around performance bug in subprocess.Popen.communicate()
  2009-08-14  7:26         ` Erik Sandberg
@ 2009-08-14  8:12           ` Karl Wiberg
  0 siblings, 0 replies; 7+ messages in thread
From: Karl Wiberg @ 2009-08-14  8:12 UTC (permalink / raw)
  To: Erik Sandberg; +Cc: Catalin Marinas, git

On 2009-08-14 09:26:06 +0200, Erik Sandberg wrote:

> Wouldn't it be better to only use the write() workaround when
> sys.version indicates that the Python bug may be present, and use
> communicate() properly otherwise?

In my opinion: no. The workaround has no negative effect (since we
know that git is well-behaved). Making it conditional would only make
the code more complex, without gaining us anything.

-- 
Karl Wiberg, Virtutech

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-08-14  8:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-31  9:36 [PATCH] Work around performance bug in subprocess.Popen.communicate() Karl Wiberg
2009-07-31 11:27 ` Catalin Marinas
2009-08-04  8:51   ` Karl Wiberg
2009-08-13 22:18     ` Catalin Marinas
2009-08-14  6:21       ` Karl Wiberg
2009-08-14  7:26         ` Erik Sandberg
2009-08-14  8:12           ` Karl Wiberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).