From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB631C54EE9 for ; Thu, 22 Sep 2022 15:00:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230310AbiIVPAW (ORCPT ); Thu, 22 Sep 2022 11:00:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231767AbiIVPAO (ORCPT ); Thu, 22 Sep 2022 11:00:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB37D2037B for ; Thu, 22 Sep 2022 08:00:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663858811; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ivgdi3fE+izLxTvy4ifpBoyfleVt3cTHPO9YgRhhbIM=; b=BTAMCA3jrPD3v7YG9wAqDPh90thZZrO3MHzPEvP4w0TNFyga0EYgIvJocRE5WAx++T5nvh FvqE2veACXiDYov+lHY2Y1h+Cu/++uEgi1gh1YmwT7r9fvAjtR1JvJ4+9dzvpMQ3dAKfZl 7Dd14WX7tXbqfZMpi3dyL41dBFEF+dI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-9-57ylKtF6NiakcnCBcnOY3A-1; Thu, 22 Sep 2022 11:00:09 -0400 X-MC-Unique: 57ylKtF6NiakcnCBcnOY3A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 63F4B29AA383; Thu, 22 Sep 2022 15:00:08 +0000 (UTC) Received: from redhat.com (unknown [10.33.36.120]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 423BC2166B26; Thu, 22 Sep 2022 15:00:03 +0000 (UTC) Date: Thu, 22 Sep 2022 16:00:00 +0100 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= To: Jason Gunthorpe Cc: Alex Williamson , Eric Auger , "Tian, Kevin" , "Rodel, Jorg" , Lu Baolu , Chaitanya Kulkarni , Cornelia Huck , Daniel Jordan , David Gibson , Eric Farman , "iommu@lists.linux.dev" , Jason Wang , Jean-Philippe Brucker , "Martins, Joao" , "kvm@vger.kernel.org" , Matthew Rosato , "Michael S. Tsirkin" , Nicolin Chen , Niklas Schnelle , Shameerali Kolothum Thodi , "Liu, Yi L" , Keqian Zhu , Steve Sistare , "libvir-list@redhat.com" , Laine Stump Subject: Re: [PATCH RFC v2 00/13] IOMMUFD Generic interface Message-ID: Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= References: <0-v2-f9436d0bde78+4bb-iommufd_jgg@nvidia.com> <20220921120649.5d2ff778.alex.williamson@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/2.2.6 (2022-06-05) X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, Sep 22, 2022 at 11:51:54AM -0300, Jason Gunthorpe wrote: > On Thu, Sep 22, 2022 at 03:49:02PM +0100, Daniel P. Berrangé wrote: > > On Thu, Sep 22, 2022 at 11:08:23AM -0300, Jason Gunthorpe wrote: > > > On Thu, Sep 22, 2022 at 12:20:50PM +0100, Daniel P. Berrangé wrote: > > > > On Wed, Sep 21, 2022 at 03:44:24PM -0300, Jason Gunthorpe wrote: > > > > > On Wed, Sep 21, 2022 at 12:06:49PM -0600, Alex Williamson wrote: > > > > > > The issue is where we account these pinned pages, where accounting is > > > > > > necessary such that a user cannot lock an arbitrary number of pages > > > > > > into RAM to generate a DoS attack. > > > > > > > > > > It is worth pointing out that preventing a DOS attack doesn't actually > > > > > work because a *task* limit is trivially bypassed by just spawning > > > > > more tasks. So, as a security feature, this is already very > > > > > questionable. > > > > > > > > The malicious party on host VM hosts is generally the QEMU process. > > > > QEMU is normally prevented from spawning more tasks, both by SELinux > > > > controls and be the seccomp sandbox blocking clone() (except for > > > > thread creation). We need to constrain what any individual QEMU can > > > > do to the host, and the per-task mem locking limits can do that. > > > > > > Even with syscall limits simple things like execve (enabled eg for > > > qemu self-upgrade) can corrupt the kernel task-based accounting to the > > > point that the limits don't work. > > > > Note, execve is currently blocked by default too by the default > > seccomp sandbox used with libvirt, as well as by the SELinux > > policy again. self-upgrade isn't a feature that exists (yet). > > That userspace has disabled half the kernel isn't an excuse for the > kernel to be insecure by design :( This needs to be fixed to enable > features we know are coming so.. > > What would libvirt land like to see given task based tracking cannot > be fixed in the kernel? There needs to be a mechanism to control individual VMs, whether by task or by cgroup. User based limits are not suited to what we need to achieve. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|