From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59873) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YtHJN-0006fJ-T2 for qemu-devel@nongnu.org; Fri, 15 May 2015 11:14:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YtHJK-0000tR-9P for qemu-devel@nongnu.org; Fri, 15 May 2015 11:14:05 -0400 Received: from lhrrgout.huawei.com ([194.213.3.17]:1658) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YtHJK-0000so-1p for qemu-devel@nongnu.org; Fri, 15 May 2015 11:14:02 -0400 Message-ID: <55560D26.3080202@huawei.com> Date: Fri, 15 May 2015 17:13:42 +0200 From: Claudio Fontana MIME-Version: 1.0 References: <1431692725-27656-1-git-send-email-hw.claudio@gmail.com> <1431692725-27656-2-git-send-email-hw.claudio@gmail.com> <5555FB55.5070909@huawei.com> <5555FEBE.5040007@redhat.com> <5556090E.7070408@redhat.com> In-Reply-To: <5556090E.7070408@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , Paolo Bonzini , hw.claudio@gmail.com, Luiz Capitulino Cc: Peter Maydell , Gonglei , qemu-devel@nongnu.org, karl@freefriends.org Hello Eric, On 15.05.2015 16:56, Eric Blake wrote: > On 05/15/2015 08:12 AM, Paolo Bonzini wrote: >> >> >> On 15/05/2015 15:57, Claudio Fontana wrote: >>> The header here mentions GPLv2+, but the module data for memmem in gnulib mentions LGPLv2+. >>> >>> Very confusing. >>> >>> The COPYING file in gnulib mentions: >>> >>> "The files in here are mostly copyright (C) Free Software Foundation, and >>> are under assorted licenses. Mostly, but not entirely, GPL. >>> >>> Many modules are provided dual-license, either GPL or LGPL at your >>> option. The headers of files in the lib directory (e.g., lib/error.c) >>> state GPL for convenience, since the bulk of current gnulib users are >>> GPL'd programs. But the files in the modules directory (e.g., >>> modules/error) state the true license of each file, [...] >>> " >>> >>> So if one would want to include the gnulib code without using the gnulib esoteric modules mechanisms, what should one do? >> >> Change it manually, I guess. > > It's still possible to use gnulib-tool for a one-time export of the > modules in question into a temporary project with correct licensing > headers written by gnulib-tool, then copy from that temporary project > into qemu. It's probably just as fast to do it by hand, but using > gnulib-tool in the procedure (and documenting in the commit message what > that procedure was, to make it easier for the next guy to copy) is > probably friendlier. I'm not sure off-hand what the command line would > be, but could help if that's the approach we want to take. > > Or, since gnulib's memmem is (mostly) sync'd with glibc's memmem (or at > least was in sync at one point), and glibc is under LGPLv2+, why not > copy straight from glibc instead of from gnulib, and then you don't have > to modify any headers? (I haven't looked lately, but glibc may have > optimizations that should be backported into gnulib) I will check the status of glibc's version, seems a good idea to get it from there. > > Historical note: gnulib had this particular implementation of memmem > first, as written by me under a dual license; then I got glibc to > incorporate it under just LGPLv2+; then since the initial > implementation, glibc has been better optimized by others. If you really > want to, you could observe that since I released the same original > memmem implementation under multiple licenses at the time I first wrote > it, you could therefore find and copy an older version under an even > looser license in the newlib project (but without glibc's enhancements, > as I can't dual-license someone else's follow-up work): > https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/memmem.c;h=25704e46;hb=HEAD > > Or back to the original question - why are we worrying about the O(n) > memmem implementation when the naive O(n^2) is MUCH shorter and easier > to understand, and where the scaling penalty is only apparent on really > long corner case strings? Is our use of memmem going to hit the corner > cases where scaling matters? For the specific use case here (memory search functionality), the haystack is normally the factor which really can scale up a lot. I would expect the needle to be usually smallish (though for string search sometimes the needle could probably arrive at the threshold of 32). When scanning a very large memory range, considering a binary string with roughly length around 32, do you think that the non-trivial implementation could bring some noticeable benefits? If it's below 32, do we get any benefits compared to the trivial implementation? > > [Personal note: I'm still tickled pink every time I see yet another > project adopt my O(n) code - I didn't come up with the algorithm, but > merely scraped it from public research, and was shocked to note how many > libc still had O(n^2) code at the time. But it is still gratifying to > see that my implementation is getting such wide adoption] > Well your code and its derived versions have been tried and tested a lot, so picking it up just makes sense to me, to avoid getting into the same corner cases and issues that your implementation already had to go through. And I am a lazy person of course! Ciao, Claudio