From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oded Gabbay Subject: Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) Date: Wed, 13 Aug 2014 17:09:49 +0300 Message-ID: <53EB71AD.1070904@amd.com> References: <1407901926-24516-1-git-send-email-j.glisse@gmail.com> <1407901926-24516-4-git-send-email-j.glisse@gmail.com> <53EB2A91.3000804@vmware.com> <20140813104246.GP10500@phenom.ffwll.local> <53EB5BA8.3010206@vmware.com> <20140813130108.GA10500@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2lp0243.outbound.protection.outlook.com [207.46.163.243]) by gabe.freedesktop.org (Postfix) with ESMTP id 96DF66E57E for ; Wed, 13 Aug 2014 07:10:07 -0700 (PDT) In-Reply-To: <20140813130108.GA10500@phenom.ffwll.local> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Daniel Vetter , Thomas Hellstrom , =?windows-1252?Q?J=E9r=F4me_Glisse?= Cc: =?windows-1252?Q?Mich?= =?windows-1252?Q?el_D=E4nzer?= , dri-devel@lists.freedesktop.org, Konrad Rzeszutek Wilk List-Id: dri-devel@lists.freedesktop.org On 13/08/14 16:01, Daniel Vetter wrote: > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>>> On 08/13/2014 05:52 AM, J=E9r=F4me Glisse wrote: >>>>> From: J=E9r=F4me Glisse >>>>> >>>>> When experiencing memory pressure we want to minimize pool size so th= at >>>>> memory we just shrinked is not added back again just as the next thin= g. >>>>> >>>>> This will divide by 2 the maximum pool size for each device each time >>>>> the pool have to shrink. The limit is bumped again is next allocation >>>>> happen after one second since the last shrink. The one second delay is >>>>> obviously an arbitrary choice. >>>> J=E9r=F4me, >>>> >>>> I don't like this patch. It adds extra complexity and its usefulness is >>>> highly questionable. >>>> There are a number of caches in the system, and if all of them added >>>> some sort of voluntary shrink heuristics like this, we'd end up with >>>> impossible-to-debug unpredictable performance issues. >>>> >>>> We should let the memory subsystem decide when to reclaim pages from >>>> caches and what caches to reclaim them from. >>> Yeah, artificially limiting your cache from growing when your shrinker >>> gets called will just break the equal-memory pressure the core mm uses = to >>> rebalance between all caches when workload changes. In i915 we let >>> everything grow without artificial bounds and only rely upon the shrink= er >>> callbacks to ensure we don't consume more than our fair share of availa= ble >>> memory overall. >>> -Daniel >> >> Now when you bring i915 memory usage up, Daniel, >> I can't refrain from bringing up the old user-space unreclaimable kernel >> memory issue, for which gem open is a good example ;) Each time >> user-space opens a gem handle, some un-reclaimable kernel memory is >> allocated, for which there is no accounting, so theoretically I think a >> user can bring a system to unusability this way. >> >> Typically there are various limits on unreclaimable objects like this, >> like open file descriptors, and IIRC the kernel even has an internal >> limit on the number of struct files you initialize, based on the >> available system memory, so dma-buf / prime should already have some >> sort of protection. > > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, > so there's not really a way to isolate gpu memory usage in a sane way for > specific processes. But there's also zero limits on actual gpu usage > itself (timeslices or whatever) so I guess no one asked for this yet. > > My comment really was about balancing mm users under the assumption that > they're all unlimited. > -Daniel > I think the point you brought up becomes very important for compute (HSA) = processes. I still don't know how to distinguish between legitimate use of = GPU = local memory and misbehaving/malicious processes. We have a requirement that HSA processes will be allowed to allocate and pi= n GPU = local memory. They do it through an ioctl. In the kernel driver, we have an accounting of those memory allocations, me= aning = that I can print a list of all the objects that were allocated by a certain = process, per device. Therefore, in theory, I can reclaim any object, but that will probably brea= k the = userspace app. If the app is misbehaving/malicious than that's ok, I guess.= But = how do I know that ? And what prevents that malicious app to re-spawn and d= o the = same allocation again ? Oded