From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7448C43381 for ; Tue, 12 Mar 2019 21:11:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 71FE22077B for ; Tue, 12 Mar 2019 21:11:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726807AbfCLVLY (ORCPT ); Tue, 12 Mar 2019 17:11:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55684 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726284AbfCLVLY (ORCPT ); Tue, 12 Mar 2019 17:11:24 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3EB26307D844; Tue, 12 Mar 2019 21:11:23 +0000 (UTC) Received: from sky.random (ovpn-121-1.rdu2.redhat.com [10.10.121.1]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EA5474C3; Tue, 12 Mar 2019 21:11:17 +0000 (UTC) Date: Tue, 12 Mar 2019 17:11:17 -0400 From: Andrea Arcangeli To: James Bottomley Cc: "Michael S. Tsirkin" , Jason Wang , David Miller , hch@infradead.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, peterx@redhat.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-parisc@vger.kernel.org Subject: Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap() Message-ID: <20190312211117.GB25147@redhat.com> References: <56374231-7ba7-0227-8d6d-4d968d71b4d6@redhat.com> <20190311095405-mutt-send-email-mst@kernel.org> <20190311.111413.1140896328197448401.davem@davemloft.net> <6b6dcc4a-2f08-ba67-0423-35787f3b966c@redhat.com> <20190311235140-mutt-send-email-mst@kernel.org> <76c353ed-d6de-99a9-76f9-f258074c1462@redhat.com> <20190312075033-mutt-send-email-mst@kernel.org> <1552405610.3083.17.camel@HansenPartnership.com> <20190312200450.GA25147@redhat.com> <1552424017.14432.11.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1552424017.14432.11.camel@HansenPartnership.com> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 12 Mar 2019 21:11:23 +0000 (UTC) Sender: linux-parisc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-parisc@vger.kernel.org On Tue, Mar 12, 2019 at 01:53:37PM -0700, James Bottomley wrote: > I've got to say: optimize what? What code do we ever have in the > kernel that kmap's a page and then doesn't do anything with it? You can > guarantee that on kunmap the page is either referenced (needs > invalidating) or updated (needs flushing). The in-kernel use of kmap is > always > > kmap > do something with the mapped page > kunmap > > In a very short interval. It seems just a simplification to make > kunmap do the flush if needed rather than try to have the users > remember. The thing which makes this really simple is that on most > architectures flush and invalidate is the same operation. If you > really want to optimize you can use the referenced and dirty bits on > the kmapped pte to tell you what operation to do, but if your flush is > your invalidate, you simply assume the data needs flushing on kunmap > without checking anything. Except other archs like arm64 and sparc do the cache flushing on copy_to_user_page and copy_user_page, not on kunmap. #define copy_user_page(to,from,vaddr,pg) __cpu_copy_user_page(to, from, vaddr) void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr) { struct page *page = virt_to_page(kto); copy_page(kto, kfrom); flush_dcache_page(page); } #define copy_user_page(to, from, vaddr, page) \ do { copy_page(to, from); \ sparc_flush_page_to_ram(page); \ } while (0) And they do nothing on kunmap: static inline void kunmap(struct page *page) { BUG_ON(in_interrupt()); if (!PageHighMem(page)) return; kunmap_high(page); } void kunmap_high(struct page *page) { unsigned long vaddr; unsigned long nr; unsigned long flags; int need_wakeup; unsigned int color = get_pkmap_color(page); wait_queue_head_t *pkmap_map_wait; lock_kmap_any(flags); vaddr = (unsigned long)page_address(page); BUG_ON(!vaddr); nr = PKMAP_NR(vaddr); /* * A count must never go down to zero * without a TLB flush! */ need_wakeup = 0; switch (--pkmap_count[nr]) { case 0: BUG(); case 1: /* * Avoid an unnecessary wake_up() function call. * The common case is pkmap_count[] == 1, but * no waiters. * The tasks queued in the wait-queue are guarded * by both the lock in the wait-queue-head and by * the kmap_lock. As the kmap_lock is held here, * no need for the wait-queue-head's lock. Simply * test if the queue is empty. */ pkmap_map_wait = get_pkmap_wait_queue_head(color); need_wakeup = waitqueue_active(pkmap_map_wait); } unlock_kmap_any(flags); /* do wake-up, if needed, race-free outside of the spin lock */ if (need_wakeup) wake_up(pkmap_map_wait); } static inline void kunmap(struct page *page) { } because they already did it just above. > > Which means after we fix vhost to add the flush_dcache_page after > > kunmap, Parisc will get a double hit (but it also means Parisc was > > the only one of those archs needed explicit cache flushes, where > > vhost worked correctly so far.. so it kinds of proofs your point of > > giving up being the safe choice). > > What double hit? If there's no cache to flush then cache flush is a > no-op. It's also a highly piplineable no-op because the CPU has the L1 > cache within easy reach. The only event when flush takes a large > amount time is if we actually have dirty data to write back to main > memory. The double hit is in parisc copy_to_user_page: #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do { \ flush_cache_page(vma, vaddr, page_to_pfn(page)); \ memcpy(dst, src, len); \ flush_kernel_dcache_range_asm((unsigned long)dst, (unsigned long)dst + len); \ } while (0) That is executed just before kunmap: static inline void kunmap(struct page *page) { flush_kernel_dcache_page_addr(page_address(page)); } Can't argue about the fact your "safer" kunmap is safer, but we cannot rely on common code unless we remove some optimization from the common code abstractions and we make all archs do kunmap like parisc. Thanks, Andrea From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E0A1C43381 for ; Wed, 13 Mar 2019 10:25:08 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5A10D2087C for ; Wed, 13 Mar 2019 10:25:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ed4RQnRh"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Kp5aNnba" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A10D2087C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=rHFAhS3D8bLImwWYY3ibpDdp2litMD8gqtmhrnkOxrU=; b=ed4RQnRhc+CHZh vnHq/pk+XqHdWjFNEGBHGiNDcocGkAjGzvb/XosWTrc0NU8F4ICvdj4VqrnN6eIit3IMZLr1mVV60 DKzjLrWraaNqmAGY01gsBBErYMDmNdWmJXKOZ3c5vkIKl0tjCwTn9JU5t8sJ3/rL0YgL4oBbfDqCT EEnR6jzBNWtqMo497XvmVx9DrrbEvtSZ4FOlTO8H5yjiRNgKS9CaaJMmBIEg5yOTw/gYpX8AlFlL8 tfSClkmYI7IKB31eF1DRS89hMeSaiQp+mmonszzGprQDh4LIDsbQAdcQ7M2cf+mKt6BlczrcTQFVA +W8B3Vb4XFwJ99WI8jXA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1h414H-0003gw-5D; Wed, 13 Mar 2019 10:25:01 +0000 Received: from casper.infradead.org ([2001:8b0:10b:1236::1]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1h40tC-0000TD-1c for linux-arm-kernel@bombadil.infradead.org; Wed, 13 Mar 2019 10:13:34 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=6D85umQO+3uCuVnh3eG7mrlbQDpMKyiC7PlzHjAJQLM=; b=Kp5aNnba7YAPr4XCnFvIpOl5c 9VqisIhMUMZMMeOggDbXbQnZfOh3gNKuuUq0Y6GRpQK9jbWjgHX0mmFw65b/ptmzXPOdiraaaF1Nl IYpR1wC2IKgg2PK5ueuBttu8Ia63bpoz+VkmFjAhxCrfdKz92NPsH7XKBkEm93nPUVlmNIz5Kh0Z3 0P9JCX+KzxVx2+lvPLVW1EdTQCQnotyEcEVY+s+HC8xpwTuwoSZZKD1gqjPIGrR4ibcErNrFYqV0Y 8xVkpp0AReckaOFT3rx0DlgHICDG3z2eJmVMIX0C0/l7YDrAKf7DGx+gerrL6WS+NbUYzrT1WsR4e 2JrnEUxhA==; Received: from mx1.redhat.com ([209.132.183.28]) by casper.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1h3oi2-0005yQ-4Y for linux-arm-kernel@lists.infradead.org; Tue, 12 Mar 2019 21:13:16 +0000 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3EB26307D844; Tue, 12 Mar 2019 21:11:23 +0000 (UTC) Received: from sky.random (ovpn-121-1.rdu2.redhat.com [10.10.121.1]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EA5474C3; Tue, 12 Mar 2019 21:11:17 +0000 (UTC) Date: Tue, 12 Mar 2019 17:11:17 -0400 From: Andrea Arcangeli To: James Bottomley Subject: Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap() Message-ID: <20190312211117.GB25147@redhat.com> References: <56374231-7ba7-0227-8d6d-4d968d71b4d6@redhat.com> <20190311095405-mutt-send-email-mst@kernel.org> <20190311.111413.1140896328197448401.davem@davemloft.net> <6b6dcc4a-2f08-ba67-0423-35787f3b966c@redhat.com> <20190311235140-mutt-send-email-mst@kernel.org> <76c353ed-d6de-99a9-76f9-f258074c1462@redhat.com> <20190312075033-mutt-send-email-mst@kernel.org> <1552405610.3083.17.camel@HansenPartnership.com> <20190312200450.GA25147@redhat.com> <1552424017.14432.11.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1552424017.14432.11.camel@HansenPartnership.com> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 12 Mar 2019 21:11:23 +0000 (UTC) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190312_211314_468146_4BF419FA X-CRM114-Status: GOOD ( 33.29 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-parisc@vger.kernel.org, kvm@vger.kernel.org, "Michael S. Tsirkin" , netdev@vger.kernel.org, Jason Wang , linux-kernel@vger.kernel.org, peterx@redhat.com, virtualization@lists.linux-foundation.org, hch@infradead.org, linux-mm@kvack.org, David Miller , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Mar 12, 2019 at 01:53:37PM -0700, James Bottomley wrote: > I've got to say: optimize what? What code do we ever have in the > kernel that kmap's a page and then doesn't do anything with it? You can > guarantee that on kunmap the page is either referenced (needs > invalidating) or updated (needs flushing). The in-kernel use of kmap is > always > > kmap > do something with the mapped page > kunmap > > In a very short interval. It seems just a simplification to make > kunmap do the flush if needed rather than try to have the users > remember. The thing which makes this really simple is that on most > architectures flush and invalidate is the same operation. If you > really want to optimize you can use the referenced and dirty bits on > the kmapped pte to tell you what operation to do, but if your flush is > your invalidate, you simply assume the data needs flushing on kunmap > without checking anything. Except other archs like arm64 and sparc do the cache flushing on copy_to_user_page and copy_user_page, not on kunmap. #define copy_user_page(to,from,vaddr,pg) __cpu_copy_user_page(to, from, vaddr) void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr) { struct page *page = virt_to_page(kto); copy_page(kto, kfrom); flush_dcache_page(page); } #define copy_user_page(to, from, vaddr, page) \ do { copy_page(to, from); \ sparc_flush_page_to_ram(page); \ } while (0) And they do nothing on kunmap: static inline void kunmap(struct page *page) { BUG_ON(in_interrupt()); if (!PageHighMem(page)) return; kunmap_high(page); } void kunmap_high(struct page *page) { unsigned long vaddr; unsigned long nr; unsigned long flags; int need_wakeup; unsigned int color = get_pkmap_color(page); wait_queue_head_t *pkmap_map_wait; lock_kmap_any(flags); vaddr = (unsigned long)page_address(page); BUG_ON(!vaddr); nr = PKMAP_NR(vaddr); /* * A count must never go down to zero * without a TLB flush! */ need_wakeup = 0; switch (--pkmap_count[nr]) { case 0: BUG(); case 1: /* * Avoid an unnecessary wake_up() function call. * The common case is pkmap_count[] == 1, but * no waiters. * The tasks queued in the wait-queue are guarded * by both the lock in the wait-queue-head and by * the kmap_lock. As the kmap_lock is held here, * no need for the wait-queue-head's lock. Simply * test if the queue is empty. */ pkmap_map_wait = get_pkmap_wait_queue_head(color); need_wakeup = waitqueue_active(pkmap_map_wait); } unlock_kmap_any(flags); /* do wake-up, if needed, race-free outside of the spin lock */ if (need_wakeup) wake_up(pkmap_map_wait); } static inline void kunmap(struct page *page) { } because they already did it just above. > > Which means after we fix vhost to add the flush_dcache_page after > > kunmap, Parisc will get a double hit (but it also means Parisc was > > the only one of those archs needed explicit cache flushes, where > > vhost worked correctly so far.. so it kinds of proofs your point of > > giving up being the safe choice). > > What double hit? If there's no cache to flush then cache flush is a > no-op. It's also a highly piplineable no-op because the CPU has the L1 > cache within easy reach. The only event when flush takes a large > amount time is if we actually have dirty data to write back to main > memory. The double hit is in parisc copy_to_user_page: #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do { \ flush_cache_page(vma, vaddr, page_to_pfn(page)); \ memcpy(dst, src, len); \ flush_kernel_dcache_range_asm((unsigned long)dst, (unsigned long)dst + len); \ } while (0) That is executed just before kunmap: static inline void kunmap(struct page *page) { flush_kernel_dcache_page_addr(page_address(page)); } Can't argue about the fact your "safer" kunmap is safer, but we cannot rely on common code unless we remove some optimization from the common code abstractions and we make all archs do kunmap like parisc. Thanks, Andrea _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel