From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41025) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vgz41-0007lD-1m for qemu-devel@nongnu.org; Thu, 14 Nov 2013 10:42:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vgz3t-0005T6-KU for qemu-devel@nongnu.org; Thu, 14 Nov 2013 10:42:36 -0500 Received: from mail-bk0-f44.google.com ([209.85.214.44]:36465) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vgz3t-0005Sz-E8 for qemu-devel@nongnu.org; Thu, 14 Nov 2013 10:42:29 -0500 Received: by mail-bk0-f44.google.com with SMTP id d7so1139577bkh.17 for ; Thu, 14 Nov 2013 07:42:28 -0800 (PST) Message-ID: <5284EF62.8070108@cloudius-systems.com> Date: Thu, 14 Nov 2013 17:42:26 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1384370613-6433-1-git-send-email-mst@redhat.com> <1384370613-6433-6-git-send-email-mst@redhat.com> <20131114144037.GA6635@redhat.com> <5284E4AB.2050900@cloudius-systems.com> <20131114153718.GA6654@redhat.com> In-Reply-To: <20131114153718.GA6654@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 5/7] exec: memory radix tree page level?compression List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org On 11/14/2013 05:37 PM, Michael S. Tsirkin wrote: > On Thu, Nov 14, 2013 at 04:56:43PM +0200, Avi Kivity wrote: >> On 11/14/2013 04:40 PM, Michael S. Tsirkin wrote: >>> On Thu, Nov 14, 2013 at 08:54:11AM +0000, Avi Kivity wrote: >>>> Michael S. Tsirkin redhat.com> writes: >>>> >>>>> At the moment, memory radix tree is already variable width, but it can >>>>> only skip the low bits of address. >>>>> >>>>> This is efficient if we have huge memory regions but inefficient if we >>>>> are only using a tiny portion of the address space. >>>>> >>>>> After we have built up the map, detect >>>>> configurations where a single L2 entry is valid. >>>>> >>>>> We then speed up the lookup by skipping one or more levels. >>>>> In case any levels were skipped, we might end up in a valid section >>>>> instead of erroring out. We handle this by checking that >>>>> the address is in range of the resulting section. >>>>> >>>> I think this is overkill. It can be done in a simpler way as follows: >>>> >>>> >>>> phys_page_find(RadixTree* tr, hwaddr index, ...) >>>> { >>>> if (index & rt->invalid_index_mask) { >>>> // not found >>>> } >>>> lp = rt->root; >>>> for (i = rt->nb_levels - 1; i >= 0 && !lp.is_leaf; --i) { >>>> ... >>>> >>>> This exploits the fact the lower portion of the address space is always >>>> filled, at least in the cases that matter to us. >>>> >>>> >>>> >>>> >>>> >>> Basically skip unused high bits? >>> Sure. >>> In fact I think both optimizations can be combined. >> Not much value in combining them -- the variable level tree check >> will be dominated by the level skip logic. >> >> IMO however skipping intermediate levels will be too rare to justify >> the complexity and the doubling of the page table size -- it can >> only happen in iommu setups that place memory in very high >> addresses. These ought to be rare. >> > Well maybe not very high address, but you can have a device with > a 64 bit bar and this will add back levels (though it would not > be slower than it is currently). > > I agree the simplicity is attractive. > > However I really would like some logic that can handle > 1 leaf > somehow. > > Specifically both tricks break if we add a full 64 bit io region > in order to return -1 for reads on master abort. That can be achieved by directing a missed lookup to a catch-all region. It would be best to do this completely internally to the memory code, without API changes.