From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:40315) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYeyy-0005BJ-6k for qemu-devel@nongnu.org; Thu, 08 Dec 2011 09:30:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RYeyw-0003zX-PM for qemu-devel@nongnu.org; Thu, 08 Dec 2011 09:29:56 -0500 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:36502) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYeyw-0003z8-4r for qemu-devel@nongnu.org; Thu, 08 Dec 2011 09:29:54 -0500 Received: from /spool/local by e28smtp09.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 8 Dec 2011 19:59:49 +0530 Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pB8ETfwB4522206 for ; Thu, 8 Dec 2011 19:59:42 +0530 Received: from d28av03.in.ibm.com (loopback [127.0.0.1]) by d28av03.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pB8ETf2F023805 for ; Fri, 9 Dec 2011 01:29:41 +1100 Message-ID: <4EE0C9D3.4000201@linux.vnet.ibm.com> Date: Thu, 08 Dec 2011 22:29:39 +0800 From: Mark Wu MIME-Version: 1.0 References: <1323259859-8709-1-git-send-email-stefanha@linux.vnet.ibm.com> <1323259859-8709-3-git-send-email-stefanha@linux.vnet.ibm.com> In-Reply-To: <1323259859-8709-3-git-send-email-stefanha@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Marcelo Tosatti , qemu-devel@nongnu.org I tried to optimize the zero detecting code with SSE instruction. The idea comes from Paolo's patch "migration: vectorize is_dup_page". It's expected to give us an noticeable improvement. But I didn't find any improvement in the qemu-io test even though I increased the image size to 5GB. The following is my test patch. Could you please review it to see if I made any mistake and SSE can help for zero detecting? Thanks. diff --git a/block/qed.c b/block/qed.c index 75a44f3..61e4a27 100644 --- a/block/qed.c +++ b/block/qed.c @@ -998,6 +998,14 @@ static void qed_aio_write_l2_update_cb(void *opaque, int ret) qed_aio_write_l2_update(acb, ret, acb->cur_cluster); } +#ifdef __SSE2__ +#include +#define VECTYPE __m128i +#define SPLAT(p) _mm_set1_epi8(*(p)) +#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF) +#define VECTYPE_ZERO _mm_setzero_si128() +#endif + /** * Determine if we have a zero write to a block of clusters * @@ -1027,6 +1035,19 @@ static bool qed_is_zero_write(QEDAIOCB *acb) } v = iov->iov_base; + +#ifdef __SSE2__ + if ((iov->iov_len & 0x0f)) { + VECTYPE zero = VECTYPE_ZERO; + VECTYPE *p = (VECTYPE *)v; + for(j = 0; j < iov->iov_len / sizeof(VECTYPE); j++) { + if (!ALL_EQ(p[j], zero)) { + return false; + } + } + continue; + } +#endif for (j = 0; j < iov->iov_len; j += sizeof(v[0])) { if (v[j >> 3]) { return false;