From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Wang Subject: Re: [PULL] vhost: cleanups and fixes Date: Tue, 12 Jun 2018 19:05:19 +0800 Message-ID: <5B1FA8EF.4030409@intel.com> References: <20180611192353-mutt-send-email-mst@kernel.org> <20180612042600-mutt-send-email-mst@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Linus Torvalds , "Michael S. Tsirkin" Cc: KVM list , Network Development , Linux Kernel Mailing List , Bjorn Andersson , Andrew Morton , virtualization List-Id: virtualization@lists.linuxfoundation.org On 06/12/2018 09:59 AM, Linus Torvalds wrote: > On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin wrote: >> Maybe it will help to have GFP_NONE which will make any allocation >> fail if attempted. Linus, would this address your comment? > It would definitely have helped me initially overlook that call chain. > > But then when I started looking at the whole dma_map_page() thing, it > just raised my hackles again. > > I would seriously suggest having a much simpler version for the "no > allocation, no dma mapping" case, so that it's *obvious* that that > never happens. > > So instead of having virtio_balloon_send_free_pages() call a really > generic complex chain of functions that in _some_ cases can do memory > allocation, why isn't there a short-circuited "vitruque_add_datum()" > that is guaranteed to never do anything like that? > > Honestly, I look at "add_one_sg()" and it really doesn't make me > happy. It looks hacky as hell. If I read the code right, you're really > trying to just queue up a simple tuple of , except you encode > it as a page pointer in order to play games with the SG logic, and > then you hmap that to the ring, except in this case it's all a fake > ring that just adds the cpu-physical address instead. > > And to figuer that out, it's like five layers of indirection through > different helper functions that *can* do more generic things but in > this case don't. > > And you do all of this from a core VM callback function with some > _really_ core VM locks held. > > That makes no sense to me. > > How about this: > > - get rid of all that code > > - make the core VM callback save the "these are the free memory > regions" in a fixed and limited array. One that DOES JUST THAT. No > crazy "SG IO dma-mapping function crap". Just a plain array of a fixed > size, pre-allocated for that virtio instance. > > - make it obvious that what you do in that sequence is ten > instructions and no allocations ("Look ma, I wrote a value to an array > and incremented the array idex, and I'M DONE") > > - then in that workqueue entry that you start *anyway*, you empty the > array and do all the crazy virtio stuff. > > In fact, while at it, just simplify the VM interface too. Instead of > traversing a random number of buddy lists, just trraverse *one* - the > top-level one. Are you seriously ever going to shrink or mark > read-only anythin *but* something big enough to be in the maximum > order? > > MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really* > want the balloon code to work on smaller things, particularly since > the whole interface is fundamentally racy and opportunistic to begin > with? OK, I will implement a new version based on the suggestions. Thanks. Best, Wei From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by aws-us-west-2-korg-lkml-1.web.codeaurora.org (Postfix) with ESMTP id 11442C5CFF1 for ; Tue, 12 Jun 2018 11:01:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C3CAF208B0 for ; Tue, 12 Jun 2018 11:01:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3CAF208B0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933194AbeFLLBa (ORCPT ); Tue, 12 Jun 2018 07:01:30 -0400 Received: from mga05.intel.com ([192.55.52.43]:26823 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932286AbeFLLB0 (ORCPT ); Tue, 12 Jun 2018 07:01:26 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Jun 2018 04:01:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,214,1526367600"; d="scan'208";a="236732334" Received: from unknown (HELO [10.239.13.97]) ([10.239.13.97]) by fmsmga006.fm.intel.com with ESMTP; 12 Jun 2018 04:01:24 -0700 Message-ID: <5B1FA8EF.4030409@intel.com> Date: Tue, 12 Jun 2018 19:05:19 +0800 From: Wei Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Linus Torvalds , "Michael S. Tsirkin" CC: KVM list , virtualization , Network Development , Linux Kernel Mailing List , Andrew Morton , Bjorn Andersson Subject: Re: [PULL] vhost: cleanups and fixes References: <20180611192353-mutt-send-email-mst@kernel.org> <20180612042600-mutt-send-email-mst@kernel.org> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/12/2018 09:59 AM, Linus Torvalds wrote: > On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin wrote: >> Maybe it will help to have GFP_NONE which will make any allocation >> fail if attempted. Linus, would this address your comment? > It would definitely have helped me initially overlook that call chain. > > But then when I started looking at the whole dma_map_page() thing, it > just raised my hackles again. > > I would seriously suggest having a much simpler version for the "no > allocation, no dma mapping" case, so that it's *obvious* that that > never happens. > > So instead of having virtio_balloon_send_free_pages() call a really > generic complex chain of functions that in _some_ cases can do memory > allocation, why isn't there a short-circuited "vitruque_add_datum()" > that is guaranteed to never do anything like that? > > Honestly, I look at "add_one_sg()" and it really doesn't make me > happy. It looks hacky as hell. If I read the code right, you're really > trying to just queue up a simple tuple of , except you encode > it as a page pointer in order to play games with the SG logic, and > then you hmap that to the ring, except in this case it's all a fake > ring that just adds the cpu-physical address instead. > > And to figuer that out, it's like five layers of indirection through > different helper functions that *can* do more generic things but in > this case don't. > > And you do all of this from a core VM callback function with some > _really_ core VM locks held. > > That makes no sense to me. > > How about this: > > - get rid of all that code > > - make the core VM callback save the "these are the free memory > regions" in a fixed and limited array. One that DOES JUST THAT. No > crazy "SG IO dma-mapping function crap". Just a plain array of a fixed > size, pre-allocated for that virtio instance. > > - make it obvious that what you do in that sequence is ten > instructions and no allocations ("Look ma, I wrote a value to an array > and incremented the array idex, and I'M DONE") > > - then in that workqueue entry that you start *anyway*, you empty the > array and do all the crazy virtio stuff. > > In fact, while at it, just simplify the VM interface too. Instead of > traversing a random number of buddy lists, just trraverse *one* - the > top-level one. Are you seriously ever going to shrink or mark > read-only anythin *but* something big enough to be in the maximum > order? > > MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really* > want the balloon code to work on smaller things, particularly since > the whole interface is fundamentally racy and opportunistic to begin > with? OK, I will implement a new version based on the suggestions. Thanks. Best, Wei