From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727661AbgJGR1t (ORCPT ); Wed, 7 Oct 2020 13:27:49 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69AA3C0613D4 for ; Wed, 7 Oct 2020 10:27:49 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id r8so2565238qtp.13 for ; Wed, 07 Oct 2020 10:27:49 -0700 (PDT) Date: Wed, 7 Oct 2020 14:27:46 -0300 From: Jason Gunthorpe Subject: Re: [PATCH 07/13] mm: close race in generic_access_phys Message-ID: <20201007172746.GU5177@ziepe.ca> References: <20201007164426.1812530-1-daniel.vetter@ffwll.ch> <20201007164426.1812530-8-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20201007164426.1812530-8-daniel.vetter@ffwll.ch> List-ID: To: Daniel Vetter Cc: DRI Development , LKML , kvm@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-media@vger.kernel.org, linux-s390@vger.kernel.org, Dan Williams , Kees Cook , Rik van Riel , Benjamin Herrensmidt , Dave Airlie , Hugh Dickins , Andrew Morton , John Hubbard , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Jan Kara , Daniel Vetter On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote: > Way back it was a reasonable assumptions that iomem mappings never > change the pfn range they point at. But this has changed: > > - gpu drivers dynamically manage their memory nowadays, invalidating > ptes with unmap_mapping_range when buffers get moved > > - contiguous dma allocations have moved from dedicated carvetouts to > cma regions. This means if we miss the unmap the pfn might contain > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > - even /dev/mem now invalidates mappings when the kernel requests that > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > ("/dev/mem: Revoke mappings when a driver claims the region") > > Accessing pfns obtained from ptes without holding all the locks is > therefore no longer a good idea. Fix this. > > Since ioremap might need to manipulate pagetables too we need to drop > the pt lock and have a retry loop if we raced. > > While at it, also add kerneldoc and improve the comment for the > vma_ops->access function. It's for accessing, not for moving the > memory from iomem to system memory, as the old comment seemed to > suggest. > > References: 28b2ee20c7cb ("access_process_vm device memory infrastructure") > Cc: Jason Gunthorpe > Cc: Dan Williams > Cc: Kees Cook > Cc: Rik van Riel > Cc: Benjamin Herrensmidt > Cc: Dave Airlie > Cc: Hugh Dickins > Cc: Andrew Morton > Cc: John Hubbard > Cc: Jérôme Glisse > Cc: Jan Kara > Cc: Dan Williams > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Signed-off-by: Daniel Vetter > --- > include/linux/mm.h | 3 ++- > mm/memory.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 44 insertions(+), 3 deletions(-) This does seem to solve the race with revoke_devmem(), but it is really ugly. It would be much nicer to wrap a rwsem around this access and the unmap. Any place using it has a nice linear translation from vm_off to pfn, so I don't think there is a such a good reason to use follow_pte in the first place. ie why not the helper be this: int generic_access_phys(unsigned long pfn, unsigned long pgprot, void *buf, size_t len, bool write) Then something like dev/mem would compute pfn and obtain the lock: dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write) { cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start)); /* FIXME: Has to be over each page of len */ if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096))) return -EPERM; down_read(&mem_sem); generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot), buf, len, write); up_read(&mem_sem); } The other cases looked simpler because they don't revoke, here the mmap_sem alone should be enough protection, they would just need to provide the linear translation to pfn. What do you think? Jason