From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tan, Jianfeng" Subject: Re: Running DPDK as an unprivileged user Date: Wed, 4 Jan 2017 19:39:18 +0800 Message-ID: <685186b4-e50e-c122-459b-e4635404c3f8@intel.com> References: <1483044080.11975.1.camel@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit To: "Walker, Benjamin" , "dev@dpdk.org" Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id EF2C12C01 for ; Wed, 4 Jan 2017 12:39:20 +0100 (CET) In-Reply-To: <1483044080.11975.1.camel@intel.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Benjamin, On 12/30/2016 4:41 AM, Walker, Benjamin wrote: > Hi all, > > I've been digging in to what it would take to run DPDK as an > unprivileged user and I have some findings that I thought > were worthy of discussion. The assumptions here are that I'm > using a very recent Linux kernel (4.8.15 to be specific) and > I'm using vfio with my IOMMU enabled. I'm only interested in > making it possible to run as an unprivileged user in this > type of environment. > > There are a few key things that DPDK needs to do in order to > run as an unprivileged user: > > 1) Allocate hugepages > 2) Map device resources > 3) Map hugepage virtual addresses to DMA addresses. > > For #1 and #2, DPDK works just fine today. You simply chown > the relevant resources in sysfs to the desired user and > everything is happy. > > The problem is #3. This currently relies on looking up the > mappings in /proc/self/pagemap, but the ability to get > physical addresses in /proc/self/pagemap as an unprivileged > user was removed from the kernel in the 4.x timeframe due to > the Rowhammer vulnerability. At this time, it is not > possible to run DPDK as an unprivileged user on a 4.x Linux > kernel. > > There is a way to make this work though, which I'll outline > now. Unfortunately, I think it is going to require some very > significant changes to the initialization flow in the EAL. > One bit of of background before I go into how to fix this - > there are three types of memory addresses - virtual > addresses, physical addresses, and DMA addresses. Sometimes > DMA addresses are called bus addresses or I/O addresses, but > I'll call them DMA addresses because I think that's the > clearest name. In a system without an IOMMU, DMA addresses > and physical addresses are equivalent, but in a system with > an IOMMU any arbitrary DMA address can be chosen by the user > to map to a given physical address. For security reasons > (rowhammer), it is no longer considered safe to expose > physical addresses to userspace, but it is perfectly fine to > expose DMA addresses when an IOMMU is present. > > DPDK today begins by allocating all of the required > hugepages, then finds all of the physical addresses for > those hugepages using /proc/self/pagemap, sorts the > hugepages by physical address, then remaps the pages to > contiguous virtual addresses. Later on and if vfio is > enabled, it asks vfio to pin the hugepages and to set their > DMA addresses in the IOMMU to be the physical addresses > discovered earlier. Of course, running as an unprivileged > user means all of the physical addresses in > /proc/self/pagemap are just 0, so this doesn't end up > working. Further, there is no real reason to choose the > physical address as the DMA address in the IOMMU - it would > be better to just count up starting at 0. Why not just using virtual address as the DMA address in this case to avoid maintaining another kind of addresses? > Also, because the > pages are pinned after the virtual to physical mapping is > looked up, there is a window where a page could be moved. > Hugepage mappings can be moved on more recent kernels (at > least 4.x), and the reliability of hugepages having static > mappings decreases with every kernel release. Do you mean kernel might take back a physical page after mapping it to a virtual page (maybe copy the data to another physical page)? Could you please show some links or kernel commits? > Note that this > probably means that using uio on recent kernels is subtly > broken and cannot be supported going forward because there > is no uio mechanism to pin the memory. > > The first open question I have is whether DPDK should allow > uio at all on recent (4.x) kernels. My current understanding > is that there is no way to pin memory and hugepages can now > be moved around, so uio would be unsafe. What does the > community think here? > > My second question is whether the user should be allowed to > mix uio and vfio usage simultaneously. For vfio, the > physical addresses are really DMA addresses and are best > when arbitrarily chosen to appear sequential relative to > their virtual addresses. Why "sequential relative to their virtual addresses"? IOMMU table is for DMA addr -> physical addr mapping. So we need to DMA addresses "sequential relative to their physical addresses"? Based on your above analysis on how hugepages are initialized, virtual addresses is a good candidate for DMA address? Thanks, Jianfeng