From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59873)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Claudio.Fontana@huawei.com>) id 1YtHJN-0006fJ-T2
	for qemu-devel@nongnu.org; Fri, 15 May 2015 11:14:07 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Claudio.Fontana@huawei.com>) id 1YtHJK-0000tR-9P
	for qemu-devel@nongnu.org; Fri, 15 May 2015 11:14:05 -0400
Received: from lhrrgout.huawei.com ([194.213.3.17]:1658)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Claudio.Fontana@huawei.com>) id 1YtHJK-0000so-1p
	for qemu-devel@nongnu.org; Fri, 15 May 2015 11:14:02 -0400
Message-ID: <55560D26.3080202@huawei.com>
Date: Fri, 15 May 2015 17:13:42 +0200
From: Claudio Fontana <claudio.fontana@huawei.com>
MIME-Version: 1.0
References: <1431692725-27656-1-git-send-email-hw.claudio@gmail.com>	<1431692725-27656-2-git-send-email-hw.claudio@gmail.com>	<5555FB55.5070909@huawei.com>
	<5555FEBE.5040007@redhat.com> <5556090E.7070408@redhat.com>
In-Reply-To: <5556090E.7070408@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, hw.claudio@gmail.com, Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>, Gonglei <arei.gonglei@huawei.com>, qemu-devel@nongnu.org, karl@freefriends.org

Hello Eric,

On 15.05.2015 16:56, Eric Blake wrote:
> On 05/15/2015 08:12 AM, Paolo Bonzini wrote:
>>
>>
>> On 15/05/2015 15:57, Claudio Fontana wrote:
>>> The header here mentions GPLv2+, but the module data for memmem in gnulib mentions LGPLv2+.
>>>
>>> Very confusing.
>>>
>>> The COPYING file in gnulib mentions:
>>>
>>> "The files in here are mostly copyright (C) Free Software Foundation, and
>>> are under assorted licenses.  Mostly, but not entirely, GPL.
>>>
>>> Many modules are provided dual-license, either GPL or LGPL at your
>>> option.  The headers of files in the lib directory (e.g., lib/error.c)
>>> state GPL for convenience, since the bulk of current gnulib users are
>>> GPL'd programs.  But the files in the modules directory (e.g.,
>>> modules/error) state the true license of each file, [...]
>>> "
>>>
>>> So if one would want to include the gnulib code without using the gnulib esoteric modules mechanisms, what should one do?
>>
>> Change it manually, I guess.
> 
> It's still possible to use gnulib-tool for a one-time export of the
> modules in question into a temporary project with correct licensing
> headers written by gnulib-tool, then copy from that temporary project
> into qemu.  It's probably just as fast to do it by hand, but using
> gnulib-tool in the procedure (and documenting in the commit message what
> that procedure was, to make it easier for the next guy to copy) is
> probably friendlier.  I'm not sure off-hand what the command line would
> be, but could help if that's the approach we want to take.
> 
> Or, since gnulib's memmem is (mostly) sync'd with glibc's memmem (or at
> least was in sync at one point), and glibc is under LGPLv2+, why not
> copy straight from glibc instead of from gnulib, and then you don't have
> to modify any headers?  (I haven't looked lately, but glibc may have
> optimizations that should be backported into gnulib)

I will check the status of glibc's version, seems a good idea to get it from there.

> 
> Historical note: gnulib had this particular implementation of memmem
> first, as written by me under a dual license; then I got glibc to
> incorporate it under just LGPLv2+; then since the initial
> implementation, glibc has been better optimized by others. If you really
> want to, you could observe that since I released the same original
> memmem implementation under multiple licenses at the time I first wrote
> it, you could therefore find and copy an older version under an even
> looser license in the newlib project (but without glibc's enhancements,
> as I can't dual-license someone else's follow-up work):
> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/memmem.c;h=25704e46;hb=HEAD
> 
> Or back to the original question - why are we worrying about the O(n)
> memmem implementation when the naive O(n^2) is MUCH shorter and easier
> to understand, and where the scaling penalty is only apparent on really
> long corner case strings?  Is our use of memmem going to hit the corner
> cases where scaling matters?

For the specific use case here (memory search functionality),
the haystack is normally the factor which really can scale up a lot.

I would expect the needle to be usually smallish (though for string search sometimes the needle could probably arrive at the threshold of 32).

When scanning a very large memory range, considering a binary string with roughly length around 32, do you think that the non-trivial implementation could bring some noticeable benefits?

If it's below 32, do we get any benefits compared to the trivial implementation?

> 
> [Personal note: I'm still tickled pink every time I see yet another
> project adopt my O(n) code - I didn't come up with the algorithm, but
> merely scraped it from public research, and was shocked to note how many
> libc still had O(n^2) code at the time.  But it is still gratifying to
> see that my implementation is getting such wide adoption]
> 

Well your code and its derived versions have been tried and tested a lot, so picking it up just makes sense to me, to avoid getting into the same corner cases and issues that your implementation already had to go through. And I am a lazy person of course!

Ciao,

Claudio