From mboxrd@z Thu Jan  1 00:00:00 1970
From: Lukas Fleischer <mlmmj@cryptocrack.de>
Date: Mon, 05 Sep 2011 12:46:42 +0000
Subject: Re: [mlmmj] read(2) syscall bloat
Message-Id: <20110905124642.GA7377@blizzard>
List-Id: <mlmmj.mlmmj.org>
References: <20110905115603.GC22957@barfooze.de>
In-Reply-To: <20110905115603.GC22957@barfooze.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: mlmmj@mlmmj.org

On Mon, Sep 05, 2011 at 10:34:19PM +1000, Ben Schmidt wrote:
> On 5/09/11 9:56 PM, Moritz Wilhelmy wrote:
> >mlmmj currently does a read(2) system call for every single byte it
> >reads from a file descriptor. This is unnecessarily inefficient and
> >slow.
> 
> Mmm. There've gotta be a lot of context switches happening there....
> 
> >Strace output is similar to the following:
> >open("/var/spool/mlmmj/foo/control/listaddress", O_RDONLY) = 4
> >read(4, "f", 1)                         = 1
> >read(4, "o", 1)                         = 1
> >read(4, "o", 1)                         = 1
> [...]
> >read(4, "\n", 1)                        = 1
> >close(4)                                = 0
> >
> >Given that there is a getline(3) function in POSIX.1-2008, shouldn't it
> >be possible to retire mygetline?
> 
> Not if getline() is new as of 2008; there are a lot of systems older
> than that around, and since Mlmmj is so nice and slim, it is an ideal
> candidate for running on older systems. I don't want to compromise that.

Well, if you really care about that, consider using fgets() which is
part of C89, even. Or just use our own buffer implementation.

> 
> >I've previously posted this issue to the musl mailing list [1], which
> >has an "anti-bloat side project", but I've been putting the mail to this
> >list off.
> >
> >I don't see where any of Rich's arguments from [2] apply.
> 
> He's just pointing out that you can't reimplement mygetline() to read in
> larger chunks without some kind of buffering. This is because reading a
> larger chunk might read past end-of-line. If it does, then you have to
> rewind the stream (not always possible) or buffer the extra output so
> that the next call to mygetline() can use it.
> 
> >Can anyone please explain why it was done this way in the first place?
> 
> Not me.
> 
> Maybe we should do some profiling to see if this truly is a bottleneck
> or not.

Agreed, some numbers would be nice. Anyway, this shouldn't be too hard
to implement and this will imply some performance improvements for
sure...

> 
> Ben.
> 
> 
> 
> >[1] http://www.openwall.com/lists/musl/2011/08/16/8
> >[2] http://www.openwall.com/lists/musl/2011/08/16/11