From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 89AA5EB64D7 for ; Fri, 23 Jun 2023 07:33:44 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qCbIA-0003f5-8D; Fri, 23 Jun 2023 03:33:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qCbI9-0003eP-0J for qemu-devel@nongnu.org; Fri, 23 Jun 2023 03:33:13 -0400 Received: from mail-lj1-x231.google.com ([2a00:1450:4864:20::231]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qCbI6-00020Y-Me for qemu-devel@nongnu.org; Fri, 23 Jun 2023 03:33:12 -0400 Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2b474dac685so5269891fa.3 for ; Fri, 23 Jun 2023 00:33:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1687505588; x=1690097588; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=vE+z5UFbLaR9NANsKioHSP0sVf6L90bFNXgRGWgvp9w=; b=uzyXO/si0B/Frg9xLXmApprfsDQXgQgl1iAem9falM6tX/nhnRP8oPO9n3y8uE9v9C 4sgnA5duvy/1GHBH1FTDSsnOa/lW3CbBlEu4359S0XViWtjHjgPOxTtdsqa08etK0j/Y 2qgO2oHi9S7YbysY1YoP9Y5jkh2qzMLrl+U19EU0NdXa0JMOOSvpUUIDy/1k/ViVLl16 Y2N6G2y9UjAKexG4DufKULPXY3xHcXliYy1y0tMa5C1dB6kAsz2+qE1S5ejviH2hf5iV vdVxGbA+c4eVnglnil/qmiCP6Q9dy1anTr5upvmdZr/fN8G+yS3LfykN1boVUlNdidaV +n1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687505588; x=1690097588; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vE+z5UFbLaR9NANsKioHSP0sVf6L90bFNXgRGWgvp9w=; b=GBzHL4hmeGkxYkEk8wjl/MSX5ufEPxwIGd6DNACWDVUTWWyZKSB6XDkXgpQtHqLlDv Qwwt8vxv3bI4zWkFLnDAVtLsu6MKFfCy87G7eR1mDnshrvLV2gL39nyIj0R9gmoApIWw WbHf+hZfJniPLMmNagTO58Ua3/+XWLEJuCc+1nhImRhi3G3Md8Hvdb41PzkwtHU2X9F/ zdDk3inmfZtVyKKbw5EvFZMB2HskRxR1Qh3scZS3amfmJEScjl4irjXrEx25UZ4w27Zu Qk6UZ/DbYGe8g7OfaixxxBSrRQZee+R3FI5ETRvjsYJ6zay/knb5iiKIFJWIPojMe1/c hx5w== X-Gm-Message-State: AC+VfDzjTeKOk32B8pCFxemPek3b0C8fsXWMUh3NR+cqe9GKlVTa57Ln 1wZpjA1w4h7q6/Ucc/Vgztsm8g== X-Google-Smtp-Source: ACHHUZ5onL9I3HCgZm1smG6In3ausVKQOG4ceAdflSvBDGngKIB4KM31UzB4D6Ea6UIpktRgIAUOJA== X-Received: by 2002:a2e:9107:0:b0:2b5:9b3b:f7ea with SMTP id m7-20020a2e9107000000b002b59b3bf7eamr566221ljg.41.1687505588282; Fri, 23 Jun 2023 00:33:08 -0700 (PDT) Received: from [192.168.69.129] ([176.176.168.147]) by smtp.gmail.com with ESMTPSA id a1-20020a05600c224100b003f9b44e5b7fsm1488802wmm.46.2023.06.23.00.33.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 23 Jun 2023 00:33:07 -0700 (PDT) Message-ID: <323d7da1-3e66-bf22-42c1-1afa4df3aeb4@linaro.org> Date: Fri, 23 Jun 2023 09:33:04 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH v4 10/17] target/riscv: Add Zvkned ISA extension support Content-Language: en-US To: Max Chou , qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: dbarboza@ventanamicro.com, Nazar Kazakov , Lawrence Hunter , William Salmon , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Liu Zhiwei , Kiran Ostrolenk References: <20230622161646.32005-1-max.chou@sifive.com> <20230622161646.32005-11-max.chou@sifive.com> From: Richard Henderson In-Reply-To: <20230622161646.32005-11-max.chou@sifive.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=2a00:1450:4864:20::231; envelope-from=richard.henderson@linaro.org; helo=mail-lj1-x231.google.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.09, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On 6/22/23 18:16, Max Chou wrote: > --- a/target/riscv/vcrypto_helper.c > +++ b/target/riscv/vcrypto_helper.c > @@ -22,6 +22,7 @@ > #include "qemu/bitops.h" > #include "qemu/bswap.h" > #include "cpu.h" > +#include "crypto/aes.h" > #include "exec/memop.h" > #include "exec/exec-all.h" > #include "exec/helper-proto.h" > @@ -195,3 +196,310 @@ RVVCALL(OPIVX2, vwsll_vx_w, WOP_UUU_W, H8, H4, DO_SLL) > GEN_VEXT_VX(vwsll_vx_b, 2) > GEN_VEXT_VX(vwsll_vx_h, 4) > GEN_VEXT_VX(vwsll_vx_w, 8) > + > +static inline void aes_sub_bytes(uint8_t round_state[4][4]) > +{ > + for (int j = 0; j < 16; j++) { > + round_state[j / 4][j % 4] = AES_sbox[round_state[j / 4][j % 4]]; > + } > +} > + > +static inline void aes_shift_bytes(uint8_t round_state[4][4]) > +{ > + uint8_t temp; > + temp = round_state[0][1]; > + round_state[0][1] = round_state[1][1]; > + round_state[1][1] = round_state[2][1]; > + round_state[2][1] = round_state[3][1]; > + round_state[3][1] = temp; > + temp = round_state[0][2]; > + round_state[0][2] = round_state[2][2]; > + round_state[2][2] = temp; > + temp = round_state[1][2]; > + round_state[1][2] = round_state[3][2]; > + round_state[3][2] = temp; > + temp = round_state[0][3]; > + round_state[0][3] = round_state[3][3]; > + round_state[3][3] = round_state[2][3]; > + round_state[2][3] = round_state[1][3]; > + round_state[1][3] = temp; > +} > + > +static inline void xor_round_key(uint8_t round_state[4][4], uint8_t *round_key) > +{ > + for (int j = 0; j < 16; j++) { > + round_state[j / 4][j % 4] = round_state[j / 4][j % 4] ^ (round_key)[j]; > + } > +} > + > +static inline void aes_inv_sub_bytes(uint8_t round_state[4][4]) > +{ > + for (int j = 0; j < 16; j++) { > + round_state[j / 4][j % 4] = AES_isbox[round_state[j / 4][j % 4]]; > + } > +} > + > +static inline void aes_inv_shift_bytes(uint8_t round_state[4][4]) > +{ > + uint8_t temp; > + temp = round_state[3][1]; > + round_state[3][1] = round_state[2][1]; > + round_state[2][1] = round_state[1][1]; > + round_state[1][1] = round_state[0][1]; > + round_state[0][1] = temp; > + temp = round_state[0][2]; > + round_state[0][2] = round_state[2][2]; > + round_state[2][2] = temp; > + temp = round_state[1][2]; > + round_state[1][2] = round_state[3][2]; > + round_state[3][2] = temp; > + temp = round_state[0][3]; > + round_state[0][3] = round_state[1][3]; > + round_state[1][3] = round_state[2][3]; > + round_state[2][3] = round_state[3][3]; > + round_state[3][3] = temp; > +} > + > +static inline uint8_t xtime(uint8_t x) > +{ > + return (x << 1) ^ (((x >> 7) & 1) * 0x1b); > +} > + > +static inline uint8_t multiply(uint8_t x, uint8_t y) > +{ > + return (((y & 1) * x) ^ ((y >> 1 & 1) * xtime(x)) ^ > + ((y >> 2 & 1) * xtime(xtime(x))) ^ > + ((y >> 3 & 1) * xtime(xtime(xtime(x)))) ^ > + ((y >> 4 & 1) * xtime(xtime(xtime(xtime(x)))))); > +} > + > +static inline void aes_inv_mix_cols(uint8_t round_state[4][4]) > +{ > + uint8_t a, b, c, d; > + for (int j = 0; j < 4; ++j) { > + a = round_state[j][0]; > + b = round_state[j][1]; > + c = round_state[j][2]; > + d = round_state[j][3]; > + round_state[j][0] = multiply(a, 0x0e) ^ multiply(b, 0x0b) ^ > + multiply(c, 0x0d) ^ multiply(d, 0x09); > + round_state[j][1] = multiply(a, 0x09) ^ multiply(b, 0x0e) ^ > + multiply(c, 0x0b) ^ multiply(d, 0x0d); > + round_state[j][2] = multiply(a, 0x0d) ^ multiply(b, 0x09) ^ > + multiply(c, 0x0e) ^ multiply(d, 0x0b); > + round_state[j][3] = multiply(a, 0x0b) ^ multiply(b, 0x0d) ^ > + multiply(c, 0x09) ^ multiply(d, 0x0e); > + } > +} > + > +static inline void aes_mix_cols(uint8_t round_state[4][4]) > +{ > + uint8_t a, b; > + for (int j = 0; j < 4; ++j) { > + a = round_state[j][0]; > + b = round_state[j][0] ^ round_state[j][1] ^ round_state[j][2] ^ > + round_state[j][3]; > + round_state[j][0] ^= xtime(round_state[j][0] ^ round_state[j][1]) ^ b; > + round_state[j][1] ^= xtime(round_state[j][1] ^ round_state[j][2]) ^ b; > + round_state[j][2] ^= xtime(round_state[j][2] ^ round_state[j][3]) ^ b; > + round_state[j][3] ^= xtime(round_state[j][3] ^ a) ^ b; > + } > +} > + > +#define GEN_ZVKNED_HELPER_VV(NAME, ...) \ > + void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env, \ > + uint32_t desc) \ > + { \ > + uint64_t *vd = vd_vptr; \ > + uint64_t *vs2 = vs2_vptr; \ > + uint32_t vl = env->vl; \ > + uint32_t total_elems = vext_get_total_elems(env, desc, 4); \ > + uint32_t vta = vext_vta(desc); \ > + \ > + for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) { \ > + uint64_t round_key[2] = { \ > + cpu_to_le64(vs2[i * 2 + 0]), \ > + cpu_to_le64(vs2[i * 2 + 1]), \ > + }; \ > + uint8_t round_state[4][4]; \ > + cpu_to_le64s(vd + i * 2 + 0); \ > + cpu_to_le64s(vd + i * 2 + 1); \ > + for (int j = 0; j < 16; j++) { \ > + round_state[j / 4][j % 4] = ((uint8_t *)(vd + i * 2))[j]; \ > + } \ > + __VA_ARGS__; \ > + for (int j = 0; j < 16; j++) { \ > + ((uint8_t *)(vd + i * 2))[j] = round_state[j / 4][j % 4]; \ > + } \ > + le64_to_cpus(vd + i * 2 + 0); \ > + le64_to_cpus(vd + i * 2 + 1); \ > + } \ > + env->vstart = 0; \ > + /* set tail elements to 1s */ \ > + vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4); \ > + } > + > +#define GEN_ZVKNED_HELPER_VS(NAME, ...) \ > + void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env, \ > + uint32_t desc) \ > + { \ > + uint64_t *vd = vd_vptr; \ > + uint64_t *vs2 = vs2_vptr; \ > + uint32_t vl = env->vl; \ > + uint32_t total_elems = vext_get_total_elems(env, desc, 4); \ > + uint32_t vta = vext_vta(desc); \ > + \ > + for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) { \ > + uint64_t round_key[2] = { \ > + cpu_to_le64(vs2[0]), \ > + cpu_to_le64(vs2[1]), \ > + }; \ > + uint8_t round_state[4][4]; \ > + cpu_to_le64s(vd + i * 2 + 0); \ > + cpu_to_le64s(vd + i * 2 + 1); \ > + for (int j = 0; j < 16; j++) { \ > + round_state[j / 4][j % 4] = ((uint8_t *)(vd + i * 2))[j]; \ > + } \ > + __VA_ARGS__; \ > + for (int j = 0; j < 16; j++) { \ > + ((uint8_t *)(vd + i * 2))[j] = round_state[j / 4][j % 4]; \ > + } \ > + le64_to_cpus(vd + i * 2 + 0); \ > + le64_to_cpus(vd + i * 2 + 1); \ > + } \ > + env->vstart = 0; \ > + /* set tail elements to 1s */ \ > + vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4); \ > + } See https://lore.kernel.org/qemu-devel/20230620110758.787479-1-richard.henderson@linaro.org/ which should greatly simplify all of this. r~