From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 250A0C433ED for ; Tue, 18 May 2021 19:01:37 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AC04160FE9 for ; Tue, 18 May 2021 19:01:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC04160FE9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=byfO7RRYjvmlXKX0B3LWhX08A77AuNdoH/+fi7+1qKU=; b=ae64noMIpzuMxIAVSXz6FHOZN wzmf2Ne1RKQNL4wN+OlRkOO4WgJ7lSDFOYuxXBbLj5KZACGSKb8GtgolIDlUfhnc9c8Ilprfc1lgj iZszm6G8DLDCRk84nkvKpLENxn1YxQEEyk2BGjC64YhCzMSU87KtlebR/1aFia7rBbQ5oj5dcHHG0 rPboRXeHtOmDHZPzL1g87085xm7o+nJ8pLTg+GatjFZ8ZqvgOpaoNR2duOD0ZuIeAIUvVpw6yT4Z+ +kV07Qsu4uHzQdK5wC5a/9sYaHZzGTobws6Xd+ry19VqbhgsRuVAZSRAfIHtzVHV1Vi9/nRNperrD K96YSi6mA==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lj4w6-001fip-FZ; Tue, 18 May 2021 18:59:22 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj4vt-001fgL-1u for linux-arm-kernel@desiato.infradead.org; Tue, 18 May 2021 18:59:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Subject:Cc:To: From:Date:Sender:Reply-To:Content-ID:Content-Description; bh=l3I+sRgsUz3jPt3eJWkIzspXHxTfKP9apTdFMbo1EQ4=; b=Eqbfn3teKz1z7He2X905/sW8+V gT8jlT9WD7dO0u735D9w4skoEafl/epomaEhWuhCmiMSHqODr73vna/GL8yQhU/1Lzd3HJHI0FD6R UwloOg5uPNPZuNEAufLuaFF1QlmfY6Q+jJ2nsWN6qqFXdGz4PmL4/bVNBaSM8Bk0QDVnkZS0shKJ9 /ROe3K9r3eAvwXTTyKR8882WRCnrCckKyzDx6luPpA46xKA2R/EdrsPNTgFAuvCNhw0f0y/dVyXyf MVDwrPgQIhLkIQ9Iavr+fp2q3A352QQb7oD1U9VZk6qY4RHzBHdUlaDbYSjuoAYLBXDDCqR9zjUQb 2kvyVj8w==; Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj4vp-00Eter-PH for linux-arm-kernel@lists.infradead.org; Tue, 18 May 2021 18:59:07 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621364343; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l3I+sRgsUz3jPt3eJWkIzspXHxTfKP9apTdFMbo1EQ4=; b=D3T/kdOBui1Go/AjQlLpFBKCXcVgG1jqnBDZb5hHX93JslcNCKLrE5glcZ7OZo5o2NAahM cf4h56GDHFdr8ELQ7fiooSyQTG4FSL8GCJSX+/CzIheLvxGPFP83jhBNxIRT+69S94O74v FWzoCB/k5aNztUGo3Dehjc5XgogVWrM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-98--oKuxaE4Nhqt5xTqP29FOg-1; Tue, 18 May 2021 14:59:01 -0400 X-MC-Unique: -oKuxaE4Nhqt5xTqP29FOg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 09369192AB81; Tue, 18 May 2021 18:58:59 +0000 (UTC) Received: from redhat.com (ovpn-113-225.phx2.redhat.com [10.3.113.225]) by smtp.corp.redhat.com (Postfix) with ESMTP id 20D581037E81; Tue, 18 May 2021 18:58:58 +0000 (UTC) Date: Tue, 18 May 2021 12:58:37 -0600 From: Alex Williamson To: Shenming Lu Cc: Cornelia Huck , Will Deacon , Robin Murphy , Joerg Roedel , Jean-Philippe Brucker , Eric Auger , , , , , , Kevin Tian , Lu Baolu , , Christoph Hellwig , Jonathan Cameron , Barry Song , , Subject: Re: [RFC PATCH v3 2/8] vfio/type1: Add a page fault handler Message-ID: <20210518125837.6de73631.alex.williamson@redhat.com> In-Reply-To: <20210409034420.1799-3-lushenming@huawei.com> References: <20210409034420.1799-1-lushenming@huawei.com> <20210409034420.1799-3-lushenming@huawei.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210518_115905_940496_AA7CC026 X-CRM114-Status: GOOD ( 33.68 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 9 Apr 2021 11:44:14 +0800 Shenming Lu wrote: > VFIO manages the DMA mapping itself. To support IOPF (on-demand paging) > for VFIO (IOMMU capable) devices, we add a VFIO page fault handler to > serve the reported page faults from the IOMMU driver. > > Signed-off-by: Shenming Lu > --- > drivers/vfio/vfio_iommu_type1.c | 114 ++++++++++++++++++++++++++++++++ > 1 file changed, 114 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 45cbfd4879a5..ab0ff60ee207 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -101,6 +101,7 @@ struct vfio_dma { > struct task_struct *task; > struct rb_root pfn_list; /* Ex-user pinned pfn list */ > unsigned long *bitmap; > + unsigned long *iopf_mapped_bitmap; > }; > > struct vfio_batch { > @@ -141,6 +142,16 @@ struct vfio_regions { > size_t len; > }; > > +/* A global IOPF enabled group list */ > +static struct rb_root iopf_group_list = RB_ROOT; > +static DEFINE_MUTEX(iopf_group_list_lock); > + > +struct vfio_iopf_group { > + struct rb_node node; > + struct iommu_group *iommu_group; > + struct vfio_iommu *iommu; > +}; > + > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ > (!list_empty(&iommu->domain_list)) > > @@ -157,6 +168,10 @@ struct vfio_regions { > #define DIRTY_BITMAP_PAGES_MAX ((u64)INT_MAX) > #define DIRTY_BITMAP_SIZE_MAX DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX) > > +#define IOPF_MAPPED_BITMAP_GET(dma, i) \ > + ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG] \ > + >> ((i) % BITS_PER_LONG)) & 0x1) Can't we just use test_bit()? > + > #define WAITED 1 > > static int put_pfn(unsigned long pfn, int prot); > @@ -416,6 +431,34 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn) > return ret; > } > > +/* > + * Helper functions for iopf_group_list > + */ > +static struct vfio_iopf_group * > +vfio_find_iopf_group(struct iommu_group *iommu_group) > +{ > + struct vfio_iopf_group *iopf_group; > + struct rb_node *node; > + > + mutex_lock(&iopf_group_list_lock); > + > + node = iopf_group_list.rb_node; > + > + while (node) { > + iopf_group = rb_entry(node, struct vfio_iopf_group, node); > + > + if (iommu_group < iopf_group->iommu_group) > + node = node->rb_left; > + else if (iommu_group > iopf_group->iommu_group) > + node = node->rb_right; > + else > + break; > + } > + > + mutex_unlock(&iopf_group_list_lock); > + return node ? iopf_group : NULL; > +} This looks like a pretty heavy weight operation per DMA fault. I'm also suspicious of this validity of this iopf_group after we've dropped the locking, the ordering of patches makes this very confusing. > + > static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > { > struct mm_struct *mm; > @@ -3106,6 +3149,77 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu, > return -EINVAL; > } > > +/* VFIO I/O Page Fault handler */ > +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void *data) >From the comment, this seems like the IOMMU fault handler (the construction of this series makes this difficult to follow) and eventually it handles more than DMA mapping, for example transferring faults to the device driver. "dma_map_iopf" seems like a poorly scoped name. > +{ > + struct device *dev = (struct device *)data; > + struct iommu_group *iommu_group; > + struct vfio_iopf_group *iopf_group; > + struct vfio_iommu *iommu; > + struct vfio_dma *dma; > + dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE); > + int access_flags = 0; > + unsigned long bit_offset, vaddr, pfn; > + int ret; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; > + struct iommu_page_response resp = {0}; > + > + if (fault->type != IOMMU_FAULT_PAGE_REQ) > + return -EOPNOTSUPP; > + > + iommu_group = iommu_group_get(dev); > + if (!iommu_group) > + return -ENODEV; > + > + iopf_group = vfio_find_iopf_group(iommu_group); > + iommu_group_put(iommu_group); > + if (!iopf_group) > + return -ENODEV; > + > + iommu = iopf_group->iommu; > + > + mutex_lock(&iommu->lock); Again, I'm dubious of our ability to grab this lock from an object with an unknown lifecycle and races we might have with that group being detached or DMA unmapped. Also, how effective is enabling IOMMU page faulting if we're serializing all faults within a container context? > + > + ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma); > + if (ret < 0) > + goto out_invalid; > + > + if (fault->prm.perm & IOMMU_FAULT_PERM_READ) > + access_flags |= IOMMU_READ; > + if (fault->prm.perm & IOMMU_FAULT_PERM_WRITE) > + access_flags |= IOMMU_WRITE; > + if ((dma->prot & access_flags) != access_flags) > + goto out_invalid; > + > + bit_offset = (iova - dma->iova) >> PAGE_SHIFT; > + if (IOPF_MAPPED_BITMAP_GET(dma, bit_offset)) > + goto out_success; If the page is mapped, why did we get a fault? Should we be returning success for a fault we shouldn't have received and did nothing to resolve? We're also referencing a bitmap that has only been declared and never allocated at this point in the patch series. > + > + vaddr = iova - dma->iova + dma->vaddr; > + > + if (vfio_pin_page_external(dma, vaddr, &pfn, true)) > + goto out_invalid; > + > + if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) { > + if (put_pfn(pfn, dma->prot)) > + vfio_lock_acct(dma, -1, true); > + goto out_invalid; > + } > + > + bitmap_set(dma->iopf_mapped_bitmap, bit_offset, 1); > + > +out_success: > + status = IOMMU_PAGE_RESP_SUCCESS; > + > +out_invalid: > + mutex_unlock(&iommu->lock); > + resp.version = IOMMU_PAGE_RESP_VERSION_1; > + resp.grpid = fault->prm.grpid; > + resp.code = status; > + iommu_page_response(dev, &resp); > + return 0; > +} > + > static long vfio_iommu_type1_ioctl(void *iommu_data, > unsigned int cmd, unsigned long arg) > { _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel