From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42339) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XhXqP-0007HZ-5L for qemu-devel@nongnu.org; Fri, 24 Oct 2014 01:55:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XhXqG-0004ND-3l for qemu-devel@nongnu.org; Fri, 24 Oct 2014 01:55:25 -0400 Received: from mail-wg0-x22f.google.com ([2a00:1450:400c:c00::22f]:60277) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XhXqF-0004N8-T8 for qemu-devel@nongnu.org; Fri, 24 Oct 2014 01:55:16 -0400 Received: by mail-wg0-f47.google.com with SMTP id x13so327441wgg.18 for ; Thu, 23 Oct 2014 22:55:15 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <5449E9BE.9050900@redhat.com> Date: Fri, 24 Oct 2014 07:55:10 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1414033363-31032-1-git-send-email-chao.p.peng@linux.intel.com> <20141023194923.GA25413@thinpad.lan.raisama.net> <20141024012716.GB3135@pengc-linux.bj.intel.com> In-Reply-To: <20141024012716.GB3135@pengc-linux.bj.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH] target-i386: add Intel AVX-512 support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chao Peng , Eduardo Habkost Cc: kvm@vger.kernel.org, "Michael S. Tsirkin" , Marcelo Tosatti , qemu-devel@nongnu.org, Vadim Rozenfeld , Laszlo Ersek , =?windows-1252?Q?Andreas_F=E4rber?= On 10/24/2014 03:27 AM, Chao Peng wrote: > On Thu, Oct 23, 2014 at 05:49:23PM -0200, Eduardo Habkost wrote: >> On Thu, Oct 23, 2014 at 11:02:43AM +0800, Chao Peng wrote: >> [...] >>> @@ -707,6 +714,24 @@ typedef union { >>> } XMMReg; >>> >>> typedef union { >>> + uint8_t _b[32]; >>> + uint16_t _w[16]; >>> + uint32_t _l[8]; >>> + uint64_t _q[4]; >>> + float32 _s[8]; >>> + float64 _d[4]; >>> +} YMMReg; >>> + >>> +typedef union { >>> + uint8_t _b[64]; >>> + uint16_t _w[32]; >>> + uint32_t _l[16]; >>> + uint64_t _q[8]; >>> + float32 _s[16]; >>> + float64 _d[8]; >>> +} ZMMReg; >>> + >>> +typedef union { >>> uint8_t _b[8]; >>> uint16_t _w[4]; >>> uint32_t _l[2]; >>> @@ -725,6 +750,20 @@ typedef struct BNDCSReg { >>> } BNDCSReg; >>> >>> #ifdef HOST_WORDS_BIGENDIAN >>> +#define ZMM_B(n) _b[63 - (n)] >>> +#define ZMM_W(n) _w[31 - (n)] >>> +#define ZMM_L(n) _l[15 - (n)] >>> +#define ZMM_S(n) _s[15 - (n)] >>> +#define ZMM_Q(n) _q[7 - (n)] >>> +#define ZMM_D(n) _d[7 - (n)] >>> + >>> +#define YMM_B(n) _b[31 - (n)] >>> +#define YMM_W(n) _w[15 - (n)] >>> +#define YMM_L(n) _l[7 - (n)] >>> +#define YMM_S(n) _s[7 - (n)] >>> +#define YMM_Q(n) _q[3 - (n)] >>> +#define YMM_D(n) _d[3 - (n)] >>> + >>> #define XMM_B(n) _b[15 - (n)] >>> #define XMM_W(n) _w[7 - (n)] >>> #define XMM_L(n) _l[3 - (n)] >>> @@ -737,6 +776,20 @@ typedef struct BNDCSReg { >>> #define MMX_L(n) _l[1 - (n)] >>> #define MMX_S(n) _s[1 - (n)] >>> #else >>> +#define ZMM_B(n) _b[n] >>> +#define ZMM_W(n) _w[n] >>> +#define ZMM_L(n) _l[n] >>> +#define ZMM_S(n) _s[n] >>> +#define ZMM_Q(n) _q[n] >>> +#define ZMM_D(n) _d[n] >>> + >>> +#define YMM_B(n) _b[n] >>> +#define YMM_W(n) _w[n] >>> +#define YMM_L(n) _l[n] >>> +#define YMM_S(n) _s[n] >>> +#define YMM_Q(n) _q[n] >>> +#define YMM_D(n) _d[n] >>> + >> >> I am probably not being able to see some future use case of those data >> structures, but: why all the extra complexity here, if only ZMM_Q and >> YMM_Q are being used in the code, and the only place affected by the >> ordering of YMMReg and ZMMReg array elements are the memcpy() calls on >> kvm_{put,get}_xsave(), where the data always have the same layout? >> > > Thanks Eduardo, then I feel comfortable to drop most of these macros and > only keep YMM_Q/ZMM_Q left. As no acutal benefit for ordering, then I > will also make these two endiness-insensitive. I think we can keep the macros. The actual cleanup would be to have a single member for the 32 512-bit ZMM registers, instead of splitting xmm/ymmh/zmmh/zmm_hi16. This will get rid of the YMM_* and ZMM_* registers. However, we could not use simple memcpy()s to marshal in and out of the XSAVE data. We can do it in 2.2. Paolo