From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36200) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bgoTa-0006gE-6K for qemu-devel@nongnu.org; Mon, 05 Sep 2016 03:37:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bgoTV-0001de-03 for qemu-devel@nongnu.org; Mon, 05 Sep 2016 03:37:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:54920) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bgoTU-0001ci-MP for qemu-devel@nongnu.org; Mon, 05 Sep 2016 03:37:48 -0400 References: <1472559136-89096-1-git-send-email-arei.gonglei@huawei.com> <1472559136-89096-2-git-send-email-arei.gonglei@huawei.com> <57C82F16.6000801@suse.de> <33183CC9F5247A488A2544077AF19020B03B3096@SZXEMA503-MBS.china.huawei.com> <662664ba-c092-8f30-9994-12f5d67224f0@suse.de> <33183CC9F5247A488A2544077AF19020B03B32DB@SZXEMA503-MBS.china.huawei.com> <2289c8d7-5d4c-84e2-99df-19abd1a90c79@suse.de> <33183CC9F5247A488A2544077AF19020B03B37E9@SZXEMA503-MBS.china.huawei.com> From: Alexander Graf Message-ID: <57CD20C8.5050309@suse.de> Date: Mon, 5 Sep 2016 09:37:44 +0200 MIME-Version: 1.0 In-Reply-To: <33183CC9F5247A488A2544077AF19020B03B37E9@SZXEMA503-MBS.china.huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Gonglei (Arei)" , Ola Liljedahl , "qemu-devel@nongnu.org" , "virtio-dev@lists.oasis-open.org" Cc: "Huangpeng (Peter)" , Luonengjun , "mst@redhat.com" , "cornelia.huck@de.ibm.com" , "stefanha@redhat.com" , "denglingli@chinamobile.com" , Jani Kokkonen , "Varun.Sethi@freescale.com" , "xin.zeng@intel.com" , "brian.a.keating@intel.com" , "liang.j.ma@intel.com" , "john.griffin@intel.com" , "Hanweidong (Randy)" , "Huangweidong (C)" , "mike.caraman@nxp.com" , Claudio Fontana , "Zhoujian (jay, Euler)" On 09/03/2016 05:18 AM, Gonglei (Arei) wrote: > Hi, > >> -----Original Message----- >> From: Alexander Graf [mailto:agraf@suse.de] >> Sent: Friday, September 02, 2016 10:05 PM >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device sp= ecification >> >> >> >> On 02.09.16 14:16, Ola Liljedahl wrote: >>> >>> On 02/09/2016, 12:26, "Gonglei (Arei)" wrot= e: >>> >>>>> -----Original Message----- >>>>> From: Alexander Graf [mailto:agraf@suse.de] >>>>> Sent: Friday, September 02, 2016 4:07 PM >>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >>>>> specification >>>>> >>>>> >>>>> >>>>> On 02.09.16 05:08, Gonglei (Arei) wrote: >>>>>> Hi Alex, >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Alexander Graf [mailto:agraf@suse.de] >>>>>>> Sent: Thursday, September 01, 2016 9:37 PM >>>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto devi= ce >>>>> specification >>>>>>> On 08/30/2016 02:12 PM, Gonglei wrote: >>>>>>>> The virtio crypto device is a virtual crypto device (ie. hardwar= e >>>>>>>> crypto accelerator card). The virtio crypto device can provide >>>>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, >> PRIMITIVE. >>>>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. >>>>>>> I have mostly a few high level comments. >>>>>>> >>>>>>> For starters, a lot of the structs rely on the compiler to pad th= em >>>>> to >>>>>>> natural alignment. That may get us into trouble when trying to >>>>> emulate a >>>>>>> virtio device on a host with different guest architecture (like a= rm >>>>> on >>>>>>> x86). It'd be a good idea to manually pad everything to be 64bit >>>>> aligned >>>>>>> - then all fields are always at the same spot. >>>>>>> >>>>>> Good point! I'll do this in the next version. Thanks! >>>>>> >>>>>>> I also have a hard time getting my head around the final flow of >>>>>>> everything. Do I always have to create a session to be able to em= it a >>>>>>> command? In that case, doesn't that slow down everything, since a >>>>>>> request would then need to wait for the host reply to receive its >>>>>>> session id? There should be some way to fire off a simple non-iv >>>>>>> operation without any session set up imho. >>>>>>> >>>>>> For symmetric algorithms, we'd better create a session before >>>>> executing >>>>> encryption >>>>>> Or decryption, because the session usually be kept for a specific >>>>>> algorithm with specific key in the production environment. And if = we >>>>> only >>>>> change the iv, >>>>>> we don't need to re-create the session. >>>>> I think we have a slight misunderstanding here :) >>>>> >>>>> The current workflow is >>>>> >>>>> -> create session >>>>> <- session key >>>>> -> data in >>>>> <- data out >>>>> ... >>>>> <- close session >>>>> -> ack >>>>> >>>>> That means that at least for the first packet you have at least one= full >>>>> round-trip cost from guest to host to guest to be able to send any = data. >>>>> >>>> Yes, that's true... >>>> >>>>> That sounds pretty expensive to me on the latency side. There are >>>>> multiple ways to mitigate that. One idea was to have a separate pat= h in >>>>> parallel to the create session + data + close session dance that wo= uld >>>>> combine them all into a single command. You would still have the se= ssion >>>>> based version, but accelerate the one-blob case. >>>>> >>>>> Another idea would be to make the guest be the session id janitor. = Then >>>>> you could just do >>>>> >>>>> -> create session with key X >>>>> -> data in >>>>> <- data out >>>>> ... >>>>> >>>>> so you save the round trip, if you combine command and data queues,= as >>>>> then the create and data bits are serialized by their position in t= he >>>>> queue. >>>>> >>>> ... But for lots of crypto requests, the exhaust is low: >>>> >>>> -> create session with key X >>>> <- session id >>>> //do something in the guest if you like >>>> -> data in with session_id >>>> -> data in with session_id >>>> -> data in with session_id >>>> ... ... (fill out data virtqueue) >>>> <-data out >>>> <-data out >>>> <-data out >>>> >>>> And this is what the production environment do currently. >>>> >>>>>> For the asymmetric algorithms, we don't need create a session IIRC= . >>>>>> >>>>>> So, I don't think this is a performance degradation, but a perform= ance >>>>> enhancement. >>>>>>> Also, I don't fully understand the split between control and data >>>>>>> queues. As far as I read things, the control queue is used to cre= ate >>>>>>> session ids and the data queues can then be used to push data. Is >>>>> there >>>>>>> any particular reason for the split? One thing that seems natural= to >>>>> me >>>>>>> would be to have sessions be per-queue, so you would create a >>>>> session on >>>>>>> a particular queue and only have it ever be available there. That= way >>>>>>> you get rid of any locking for sessions. >>>>>>> >>>>>> We want to keep a unify request type (structure) for data queue, s= o >>>>> we can >>>>>> keep the session operation in the control queue. Of course the >>>>> control queue >>>>>> only used to create sessions currently, but we can extend its >>>>> functions if >>>>> needed >>>>>> in the future. >>>>> I still don't understand. With separate control+data queues you jus= t get >>>>> yourself into synchronization troubles. Both struct >>>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header alread= y >>>>> have an opcode as first le32 field. You can easily use that to dete= rmine >>>>> both length of the payload as well as command (control vs data). >>>>> >>>> There is a big problem that the control handle logic is synchronizat= ion, >>>> but the data queue >>>> handling logic is asynchronization. We can't combine them into one q= ueue. >>>> It will decrease the performance because you need indentify each pac= ket >>>> if we do this forcedly. >>> Are you saying that control and data operations are handled by separa= te >>> "blocks=B2? >>> If you combined control and data queues, there would have to be a (SW= ) >>> demultiplexer >>> that would add overhead (and potentially decrease throughout) especia= lly >>> for the data >>> operations? >> Uh, the multiplexer is as simple as a switch() statement on the opcode= , >> no? It might stall that one particular queue, but that sounds like >> something you can improve by increasing the number of queues (and inve= nt >> something smart to ease polling of activity on more queues). >> > Actually, the virtio multiple queue's capacity is based on the backend = crypto device, > like multiple queue tap device work with virtio-net. So what happens when you have 5 crypto accelerator cards in your system? = :) Alex