From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42075) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c7lt2-0000VW-RJ for qemu-devel@nongnu.org; Fri, 18 Nov 2016 11:19:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c7lsy-0003mG-Rl for qemu-devel@nongnu.org; Fri, 18 Nov 2016 11:19:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38910) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c7lsy-0003lp-HD for qemu-devel@nongnu.org; Fri, 18 Nov 2016 11:19:32 -0500 Date: Fri, 18 Nov 2016 11:19:28 -0500 From: Jeff Cody Message-ID: <20161118161928.GB18745@localhost.localdomain> References: <1475035789-685-1-git-send-email-ashish.mittal@veritas.com> <20160928214510.GA2837@stefanha-x1.localdomain> <20161118072621.GA2607@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ketan Nilangekar Cc: Stefan Hajnoczi , ashish mittal , qemu-devel , Paolo Bonzini , Kevin Wolf , Markus Armbruster , "Daniel P. Berrange" , Fam Zheng , Ashish Mittal , Abhijit Dey , Buddhi Madhav , "Venkatesha M.G." , Abhishek Kane On Fri, Nov 18, 2016 at 10:34:54AM +0000, Ketan Nilangekar wrote: > > > > > > On 11/18/16, 12:56 PM, "Jeff Cody" wrote: > > >On Wed, Nov 16, 2016 at 08:12:41AM +0000, Stefan Hajnoczi wrote: > >> On Tue, Nov 15, 2016 at 10:38 PM, ashish mittal wrote: > >> > On Wed, Sep 28, 2016 at 2:45 PM, Stefan Hajnoczi wrote: > >> >> On Tue, Sep 27, 2016 at 09:09:49PM -0700, Ashish Mittal wrote: > >> >> 5. > >> >> I don't see any endianness handling or portable alignment of struct > >> >> fields in the network protocol code. Binary network protocols need to > >> >> take care of these issue for portability. This means libqnio compiled > >> >> for different architectures will not work. Do you plan to support any > >> >> other architectures besides x86? > >> >> > >> > > >> > No, we support only x86 and do not plan to support any other arch. > >> > Please let me know if this necessitates any changes to the configure > >> > script. > >> > >> I think no change to ./configure is necessary. The library will only > >> ship on x86 so other platforms will never attempt to compile the code. > >> > >> >> 6. > >> >> The networking code doesn't look robust: kvset uses assert() on input > >> >> from the network so the other side of the connection could cause SIGABRT > >> >> (coredump), the client uses the msg pointer as the cookie for the > >> >> response packet so the server can easily crash the client by sending a > >> >> bogus cookie value, etc. Even on the client side these things are > >> >> troublesome but on a server they are guaranteed security issues. I > >> >> didn't look into it deeply. Please audit the code. > >> >> > >> > > >> > By design, our solution on OpenStack platform uses a closed set of > >> > nodes communicating on dedicated networks. VxHS servers on all the > >> > nodes are on a dedicated network. Clients (qemu) connects to these > >> > only after reading the server IP from the XML (read by libvirt). The > >> > XML cannot be modified without proper access. Therefore, IMO this > >> > problem would be relevant only if someone were to use qnio as a > >> > generic mode of communication/data transfer, but for our use-case, we > >> > will not run into this problem. Is this explanation acceptable? > >> > >> No. The trust model is that the guest is untrusted and in the worst > >> case may gain code execution in QEMU due to security bugs. > >> > >> You are assuming block/vxhs.c and libqnio are trusted but that > >> assumption violates the trust model. > >> > >> In other words: > >> 1. Guest exploits a security hole inside QEMU and gains code execution > >> on the host. > >> 2. Guest uses VxHS client file descriptor on host to send a malicious > >> packet to VxHS server. > >> 3. VxHS server is compromised by guest. > >> 4. Compromised VxHS server sends malicious packets to all other > >> connected clients. > >> 5. All clients have been compromised. > >> > >> This means both the VxHS client and server must be robust. They have > >> to validate inputs to avoid buffer overflows, assertion failures, > >> infinite loops, etc. > >> > >> Stefan > > > > > >The libqnio code is important with respect to the VxHS driver. It is a bit > >different than other existing external protocol drivers, in that the current > >user and developer base is small, and the code itself is pretty new. So I > >think for the VxHS driver here upstream, we really do need to get some of > >the libqnio issues squared away. I don't know if we've ever explicitly > >address the extent to which libqnio issues affect the driver > >merging, so I figure it is probably worth discussing here. > > > >To try and consolidate libqnio discussion, here is what I think I've read / > >seen from others as the major issues that should be addressed in libqnio: > > > >* Code auditing, static analysis, and general code cleanup. Things like > > memory leaks shouldn't be happening, and some prior libqnio compiler > > warnings imply that there is more code analysis that should be done with > > libqnio. > > > > (With regards to memory leaks: Valgrind may be useful to track these down: > > > > # valgrind ./qemu-io -c 'write -pP 0xae 66000 128k' \ > > vxhs://localhost/test.raw > > > > ==30369== LEAK SUMMARY: > > ==30369== definitely lost: 4,168 bytes in 2 blocks > > ==30369== indirectly lost: 1,207,720 bytes in 58,085 blocks) > > We have done and are doing exhaustive memory leak tests using valgrind. > Memory leaks within qnio have been addressed to some extent. We will post > detailed valgrind results to this thread. > That is good to hear. I ran the above on the latest HEAD from the qnio github repo, so I look forward to checking out the latest code once it is available. > > > >* Potential security issues such as buffer overruns, input validation, etc., > > need to be audited. > > We have known a few such issues from previous comments and have addressed > some of those. If there are any important outstanding ones, please let us > know and we will fix them on priority. > One concern is that the issues noted are not from an exhaustive review on Stefan's part, AFAIK. When Stefan called for auditing the code, that is really a call to look for other potential security flaws as well, using the perspective he outlined on the trust model. > > > >* Async operations need to be truly asynchronous, without blocking calls. > > There is only one blocking call in libqnio reconnect which we has been > pointed out. We will fix this soon. > Great, thanks! > > > >* Daniel pointed out that there is no authentication method for taking to a > > remote server. This seems a bit scary. Maybe all that is needed here is > > some clarification of the security scheme for authentication? My > > impression from above is that you are relying on the networks being > > private to provide some sort of implicit authentication, though, and this > > seems fragile (and doesn't protect against a compromised guest or other > > process on the server, for one). > > Our auth scheme is based on network isolation at L2/L3 level. If there is > a simplified authentication mechanism which we can implement without > imposing significant penalties on IO performance, please let us know and > we will implement that if feasible. > > > > >(if I've missed anything, please add it here!) > > > >-Jeff