From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs Date: Thu, 25 Mar 2021 14:16:45 -0300 Message-ID: <20210325171645.GF2356281@nvidia.com> References: <20210319124645.GP2356281@nvidia.com> <20210319135432.GT2356281@nvidia.com> <20210319112221.5123b984@jacob-builder> <20210324100246.4e6b8aa1@jacob-builder> <20210324170338.GM2356281@nvidia.com> <20210324151230.466fd47a@jacob-builder> <20210325100236.17241a1c@jacob-builder> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8497p9PafWXumWQXnOpf6+OujpPkbtt2AmI5jMKNvrE=; b=gVoXJlLxayE3FyCpqoudz9IgiUywhEuDf2CNijaRiBcvW2xU18IghecsLD5uRVD9MoGGmmyz0L5PzPGqorDGT07/e4nggWF2iBf8DH2xcO0Nu/yAzwLKvtWI+Nr9DnhTdz5u4zj2lX877EZqotl6/GlLtal/9krr5ACapJsToQQ7yHh64/WGoqPmSJMVTNG2MZnLjts+rzg5i9x0pnWGWohGuUeCD5jWu0Xb2y5SDVDF9XGzLVrtahmgybUZvFErR95QYrkGAad1gpgEVeu0+0V0BPIH4XrZaNFnz1RKRR5LHouHAnAsFqJq0biy4ctEZGBwT8tMqkCLpA3PTxWTzA== Content-Disposition: inline In-Reply-To: <20210325100236.17241a1c@jacob-builder> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Sender: "iommu" To: Jacob Pan Cc: Jean-Philippe Brucker , "Tian, Kevin" , Alex Williamson , Raj Ashok , Jonathan Corbet , Jean-Philippe Brucker , LKML , Dave Jiang , iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Li Zefan , Johannes Weiner , Tejun Heo , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Wu Hao , David Woodhouse On Thu, Mar 25, 2021 at 10:02:36AM -0700, Jacob Pan wrote: > Hi Jean-Philippe, > > On Thu, 25 Mar 2021 11:21:40 +0100, Jean-Philippe Brucker > wrote: > > > On Wed, Mar 24, 2021 at 03:12:30PM -0700, Jacob Pan wrote: > > > Hi Jason, > > > > > > On Wed, 24 Mar 2021 14:03:38 -0300, Jason Gunthorpe > > > wrote: > > > > On Wed, Mar 24, 2021 at 10:02:46AM -0700, Jacob Pan wrote: > > > > > > Also wondering about device driver allocating auxiliary domains > > > > > > for their private use, to do iommu_map/unmap on private PASIDs (a > > > > > > clean replacement to super SVA, for example). Would that go > > > > > > through the same path as /dev/ioasid and use the cgroup of > > > > > > current task? > > > > > > > > > > For the in-kernel private use, I don't think we should restrict > > > > > based on cgroup, since there is no affinity to user processes. I > > > > > also think the PASID allocation should just use kernel API instead > > > > > of /dev/ioasid. Why would user space need to know the actual PASID > > > > > # for device private domains? Maybe I missed your idea? > > > > > > > > There is not much in the kernel that isn't triggered by a process, I > > > > would be careful about the idea that there is a class of users that > > > > can consume a cgroup controlled resource without being inside the > > > > cgroup. > > > > > > > > We've got into trouble before overlooking this and with something > > > > greenfield like PASID it would be best built in to the API to prevent > > > > a mistake. eg accepting a cgroup or process input to the allocator. > > > > > > > Make sense. But I think we only allow charging the current cgroup, how > > > about I add the following to ioasid_alloc(): > > > > > > misc_cg = get_current_misc_cg(); > > > ret = misc_cg_try_charge(MISC_CG_RES_IOASID, misc_cg, 1); > > > if (ret) { > > > put_misc_cg(misc_cg); > > > return ret; > > > } > > > > Does that allow PASID allocation during driver probe, in kernel_init or > > modprobe context? > > > Good point. Yes, you can get cgroup subsystem state in kernel_init for > charging/uncharging. I would think module_init should work also since it is > after kernel_init. I have tried the following: > static int __ref kernel_init(void *unused) > { > int ret; > + struct cgroup_subsys_state *css; > + css = task_get_css(current, pids_cgrp_id); > > But that would imply: > 1. IOASID has to be built-in, not as module > 2. IOASIDs charged on PID1/init would not subject to cgroup limit since it > will be in the root cgroup and we don't support migration nor will migrate. > > Then it comes back to the question of why do we try to limit in-kernel > users per cgroup if we can't enforce these cases. Are these real use cases? Why would a driver binding to a device create a single kernel pasid at bind time? Why wouldn't it use untagged DMA? When someone needs it they can rework it and explain why they are doing something sane. Jason