From mboxrd@z Thu Jan  1 00:00:00 1970
From: Oded Gabbay <oded.gabbay@amd.com>
Subject: Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure
 minimize the size of memory pool)
Date: Wed, 13 Aug 2014 17:09:49 +0300
Message-ID: <53EB71AD.1070904@amd.com>
References: <1407901926-24516-1-git-send-email-j.glisse@gmail.com>
 <1407901926-24516-4-git-send-email-j.glisse@gmail.com>
 <53EB2A91.3000804@vmware.com> <20140813104246.GP10500@phenom.ffwll.local>
 <53EB5BA8.3010206@vmware.com> <20140813130108.GA10500@phenom.ffwll.local>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from na01-by2-obe.outbound.protection.outlook.com
 (mail-by2lp0243.outbound.protection.outlook.com [207.46.163.243])
 by gabe.freedesktop.org (Postfix) with ESMTP id 96DF66E57E
 for <dri-devel@lists.freedesktop.org>; Wed, 13 Aug 2014 07:10:07 -0700 (PDT)
In-Reply-To: <20140813130108.GA10500@phenom.ffwll.local>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: Daniel Vetter <daniel@ffwll.ch>, Thomas Hellstrom <thellstrom@vmware.com>, =?windows-1252?Q?J=E9r=F4me_Glisse?= <jglisse@redhat.com>
Cc: =?windows-1252?Q?Mich?= =?windows-1252?Q?el_D=E4nzer?= <michel@daenzer.net>, dri-devel@lists.freedesktop.org, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
List-Id: dri-devel@lists.freedesktop.org


On 13/08/14 16:01, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>>>> On 08/13/2014 05:52 AM, J=E9r=F4me Glisse wrote:
>>>>> From: J=E9r=F4me Glisse <jglisse@redhat.com>
>>>>>
>>>>> When experiencing memory pressure we want to minimize pool size so th=
at
>>>>> memory we just shrinked is not added back again just as the next thin=
g.
>>>>>
>>>>> This will divide by 2 the maximum pool size for each device each time
>>>>> the pool have to shrink. The limit is bumped again is next allocation
>>>>> happen after one second since the last shrink. The one second delay is
>>>>> obviously an arbitrary choice.
>>>> J=E9r=F4me,
>>>>
>>>> I don't like this patch. It adds extra complexity and its usefulness is
>>>> highly questionable.
>>>> There are a number of caches in the system, and if all of them added
>>>> some sort of voluntary shrink heuristics like this, we'd end up with
>>>> impossible-to-debug unpredictable performance issues.
>>>>
>>>> We should let the memory subsystem decide when to reclaim pages from
>>>> caches and what caches to reclaim them from.
>>> Yeah, artificially limiting your cache from growing when your shrinker
>>> gets called will just break the equal-memory pressure the core mm uses =
to
>>> rebalance between all caches when workload changes. In i915 we let
>>> everything grow without artificial bounds and only rely upon the shrink=
er
>>> callbacks to ensure we don't consume more than our fair share of availa=
ble
>>> memory overall.
>>> -Daniel
>>
>> Now when you bring i915 memory usage up, Daniel,
>> I can't refrain from bringing up the old user-space unreclaimable kernel
>> memory issue, for which gem open is a good example ;) Each time
>> user-space opens a gem handle, some un-reclaimable kernel memory is
>> allocated, for which there is no accounting, so theoretically I think a
>> user can bring a system to unusability this way.
>>
>> Typically there are various limits on unreclaimable objects like this,
>> like open file descriptors, and IIRC the kernel even has an internal
>> limit on the number of struct files you initialize, based on the
>> available system memory, so dma-buf / prime should already have some
>> sort of protection.
>
> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
> so there's not really a way to isolate gpu memory usage in a sane way for
> specific processes. But there's also zero limits on actual gpu usage
> itself (timeslices or whatever) so I guess no one asked for this yet.
>
> My comment really was about balancing mm users under the assumption that
> they're all unlimited.
> -Daniel
>
I think the point you brought up becomes very important for compute (HSA) =

processes. I still don't know how to distinguish between legitimate use of =
GPU =

local memory and misbehaving/malicious processes.

We have a requirement that HSA processes will be allowed to allocate and pi=
n GPU =

local memory. They do it through an ioctl.
In the kernel driver, we have an accounting of those memory allocations, me=
aning =

that I can print a list of all the objects that were allocated by a certain =

process, per device.
Therefore, in theory, I can reclaim any object, but that will probably brea=
k the =

userspace app. If the app is misbehaving/malicious than that's ok, I guess.=
 But =

how do I know that ? And what prevents that malicious app to re-spawn and d=
o the =

same allocation again ?

	Oded