From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61AB5C7EE31 for ; Thu, 26 Jun 2025 16:25:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDB936B00AC; Thu, 26 Jun 2025 12:25:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB2E26B00AD; Thu, 26 Jun 2025 12:25:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC8A46B00AF; Thu, 26 Jun 2025 12:25:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B37A26B00AC for ; Thu, 26 Jun 2025 12:25:07 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DFCEE1A024E for ; Thu, 26 Jun 2025 16:25:06 +0000 (UTC) X-FDA: 83598076212.25.E4DAEF1 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf26.hostedemail.com (Postfix) with ESMTP id D431D140019 for ; Thu, 26 Jun 2025 16:25:04 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0hvlJP3b; spf=pass (imf26.hostedemail.com: domain of dmatlack@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750955105; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2WtSi6mWLqtNRuZy/c/vTw2eX1kliE8GA7N1s1VWEHE=; b=YUM1o+S+pwCYJSnB3IUXGWsr+zpL/nj/rz8vqG+/PCVHtt4PgpmP0rHOQteDIGH+y/02Cj 2LFBaVtJpSJWPDhS98GTnb46h3G4VRl5tfB5WstPDTcIimcRNCLRwa6hCYRS4VPaCVaETT jsge7babAii8PUPZdTumrNj+Y6zzA7A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750955105; a=rsa-sha256; cv=none; b=LO570357fzlfWNtYGcb4KKYO8t2PWJQxkZDaM73veoNZIXiVua0PkhrPuSnzlqogEBhZUd WY15COq86ITIZrY5sulv69SySVjHN9cLI2Q98AKK/kodpMDXTagx/x1J6J49Oz3YYYM9oW 1Nu5ZtSqfYrW8L4nj/N3dBzjY2g9dLE= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0hvlJP3b; spf=pass (imf26.hostedemail.com: domain of dmatlack@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-553be4d2fbfso1398944e87.0 for ; Thu, 26 Jun 2025 09:25:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750955103; x=1751559903; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2WtSi6mWLqtNRuZy/c/vTw2eX1kliE8GA7N1s1VWEHE=; b=0hvlJP3bCEzAWKCPL0kQogPRMYcMIDfziWqWj9nB0mR3ET2DRkXhRGVXpthrAXFP+g riHUoVzTvwwWMONyT7RWi+6xBRtu0yJMLEwu3MypPF0xCZEcYRGjFcbCEgrPsQ1bIeXk jqVe8Wdz+ZDIoAoyBrliqUM2gwOkS/EipMJCQO1wZeYHhDx7ZfXf60Jf2+4xjZpZM3x7 +yXZHK/JXFAbV8MrYsuqKFzLd9ovVjowY6MT/W/gNoLMfqTAzpMCy8vf5E31RVUioAUA 1i/vTA4ybRFlSh7wq+y9ZfFEtyjpWfHkMgfK/505Fldn3qpFs7RMzYxpaUzssU7xdCGC RCZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750955103; x=1751559903; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2WtSi6mWLqtNRuZy/c/vTw2eX1kliE8GA7N1s1VWEHE=; b=jLEvi7oSc2QnDbMDVdF4jgwwOJNeTFrOi/PJlr0cQ3phOlmhG1kF+Gue1FUSW2j4zO slOCuElFwRP19P14DSU0ej1nYyb07THW//7ETEIsn06k++mQzJdbqitLzZbsm0OXRHtb 1NmN6aV2vnkig7wLRgae3E9wQx3nMlIQGRQQJXVPwNZ3TThHCZbAaBSnM4IxrPdWDayy yYydPVQVhkxaM/Rcj3WN5cWh6N0Yacm2n5agnNc2WU15xPDp0/Q3UZRa4KEYuLyzMRC6 VhB4vLCiqLOqqMH8hAidmexVkdSySvZp4EgH8/hh3s7o4GqDVPPt/0I6A4oIFxIwyPLQ fDHQ== X-Forwarded-Encrypted: i=1; AJvYcCW3PrjqzO4XdU5jN4fDWHY/MH+dZ6L2E/UWlrji00MSp5ktXMbcpNyD02g6+yc1IEZKisxnESVqWg==@kvack.org X-Gm-Message-State: AOJu0Yyni4zPXffkeWMRqOFLxxQ5ffCMHiG3GD9C7oNPnQnQLzuL5iWs w4RS4bM6mo1i8kLROI2mJddZBjpJp3ezda73Hgv0ZjTXHj2WFFsQT7S+fPv/eB6zD0YqioZ5a+K YqBPWMWeTRsOe4qV0L8jIva5iRJZnJ3+Ddb4FJcUS X-Gm-Gg: ASbGncvIjtD0Ns+Zxw9LaBsGNLOv3mBLNuIv6c0Jky1x7h5MFpU5OETmKiJLC+78Oba jcLlf98QJFPss+d6enSsqbolzNUgyGaQShIbXsmzhRFvWQ2j9KZsfjDx1a7/+uYjz8rupFZcD06 2BiRTmMfTZymh8ihUUvh5FbgwIAG2N262KeDcaKvEDKa0= X-Google-Smtp-Source: AGHT+IGyhDEU8cUvSqOo67s+WNUbKjEMkKdKLPAvSvNXA9Mo1SyCdKcZHxuaEfWS6UPDD7wJ32zJpLgtv+MCjRZ8GBY= X-Received: by 2002:a05:6512:3d8b:b0:553:24b7:2f6f with SMTP id 2adb3069b0e04-5550b9fe64fmr76705e87.51.1750955102642; Thu, 26 Jun 2025 09:25:02 -0700 (PDT) MIME-Version: 1.0 References: <20250515182322.117840-1-pasha.tatashin@soleen.com> <20250515182322.117840-11-pasha.tatashin@soleen.com> <20250624-akzeptabel-angreifbar-9095f4717ca4@brauner> <20250625-akrobatisch-libellen-352997eb08ef@brauner> In-Reply-To: From: David Matlack Date: Thu, 26 Jun 2025 09:24:35 -0700 X-Gm-Features: Ac12FXzkqemS34rQSDokhQ5hdwtdPGENZ3MZ1wEJgcl6xG2sOAPyUhBM04Y4Dv4 Message-ID: Subject: Re: [RFC v2 10/16] luo: luo_ioctl: add ioctl interface To: Pratyush Yadav Cc: Christian Brauner , Pasha Tatashin , jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, rppt@kernel.org, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D431D140019 X-Stat-Signature: 7451sqxwkb665g57ajc4ddxbpnefg8z6 X-Rspam-User: X-HE-Tag: 1750955104-67280 X-HE-Meta: U2FsdGVkX18Nrrj7vCu136EbNJTvi9sMsw9CaNnzpMQyVDWVKj3yjbcpwiPnk1PugyZbYl6eW3P0T60Kks9UEzMU4Ctl8asGNPu6cLTqX6pHXlkxwE1NKfBdmelCp7MZ4R+EgOwAtgg6IdyF9zUthxX7I1E/3+DyK8YaEsV7kasTSxYAmVwe6xxv28t2cAYb2bIbAA58JMQMTmcfLoYQtCXgBCfya5ZtO+vuG0vsKMOIcvCvVQDk9JC2DPNs1j/dDQDUG+wCDIUKM/JxErZo8BkvTJ6/YyVqR20UfWCpExMp9cgDTmm3dNVAvx4ZJFeKL7EkndKgASkyhQS87uuDQ4tIxTSp4MdbbeX+KU+kgDEz+d4YreOsNju0uic/ZZ1FrwHuM14SQgDCl9f7S1zkhmU+lR2/005lnLwXNnnW67ZE03y6N7pM8/sCnl1U/ZRMqFZWSpBMKW0aj3PmruEFpaqk8iShVUJpc6I539qdYc1eZ1N/WdJZuaxz8IAyETYtFTHUT9lG1KLyd09q+Ujd1rAiOcZCpfV0pLZN7Yak/vc7m4cR2AKwpAFzP5zIBlN4QdxAPwbgQDaHpDmLcKQksIYYwOzS9u8fF4bO27fT2+ybntbHFvHmWIX01G3CDkNkveArjSluLJAx55dnxBSOue6RAsKf1FDEj3BHKXCOUD1ui3ET8QykrdpS5a7eTVcMIY1lMVC05MSG/xGVE8CUWtVHkvs300E2tT6+jeV+egiWxK83rvTTNQXTMTHSA9eE7w4wHkUQueSOf5bczRAE6INsfjxx0nXpbxF50LL+pPj0oRQxrCCe/1T50nO37sGE2Gfzsd/GAg/ObDAUMAfLPqzaPTYbELvZ11udv0OMExXxDVwRj615uQ9kBRXqxsRC3OFhhfINOkkXEFGFldS04uVnP6qN5zRkyLyPq6dLXBu/VCAjnRVkLVY80O+/iurNvYkx3QCQur6OghBnSm3 /gw+YSYv I1iBtYcEmWAqAead6QEXDMaD2jNbfE/F6Vv1rxtWlejtdmt0OIjjFPeMzYuuVi4oG6fVQbWMQVEPzhGt9MAT3jMQWB/p7lPRSLJm7wT90kMgxM4O4mMROYM6YRAGlfmRcwb7V0MbTK03PzKz2XJTRMHFv4tCITb06sZbIH/AyQlOErNCMWT4AbhR6WhXdPl86oqe8nGWMaG1j2l+2NRK3oyMjL9ueyOkNQE6uw8a0G46BukewqKrEuWvNkrGEgSZ9rqP8/GmYfYUQajrdDsnTEArcWXfYM+xCG5euUR50VivqeWE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 26, 2025 at 8:42=E2=80=AFAM Pratyush Yadav wrote: > > On Wed, Jun 25 2025, David Matlack wrote: > > > On Wed, Jun 25, 2025 at 2:36=E2=80=AFAM Christian Brauner wrote: > >> > > >> > While I agree that a filesystem offers superior introspection and > >> > integration with standard tools, building this complex, stateful > >> > orchestration logic on top of VFS seemed to be forcing a square peg > >> > into a round hole. The ioctl interface, while more opaque, provides = a > >> > direct and explicit way to command the state machine and manage thes= e > >> > complex lifecycle and dependency rules. > >> > >> I'm not going to argue that you have to switch to this kexecfs idea > >> but... > >> > >> You're using a character device that's tied to devmptfs. In other word= s, > >> you're already using a filesystem interface. Literally the whole code > >> here is built on top of filesystem APIs. So this argument is just very > >> wrong imho. If you can built it on top of a character device using VFS > >> interfaces you can do it as a minimal filesystem. > >> > >> You're free to define the filesystem interface any way you like it. We > >> have a ton of examples there. All your ioctls would just be tied to th= e > >> fileystem instance instead of the /dev/somethingsomething character > >> device. The state machine could just be implemented the same way. > >> > >> One of my points is that with an fs interface you can have easy state > >> seralization on a per-service level. IOW, you have a bunch of virtual > >> machines running as services or some networking services or whatever. > >> You could just bind-mount an instance of kexecfs into the service and > >> the service can persist state into the instance and easily recover it > >> after kexec. > > > > This approach sounds worth exploring more. It would avoid the need for > > a centralized daemon to mediate the preservation and restoration of > > all file descriptors. > > One of the jobs of the centralized daemon is to decide the _policy_ of > who gets to preserve things and more importantly, make sure the right > party unpreserves the right FDs after a kexec. I don't see how this > interface fixes this problem. You would still need a way to identify > which kexecfs instance belongs to who and enforce that. The kernel > probably shouldn't be the one doing this kind of policy so you still > need some userspace component to make those decisions. The main benefits I see of kexecfs is that it avoids needing to send all FDs over UDS to/from liveupdated and therefore the need for dynamic cross-process communication (e.g. RPCs). Instead, something just needs to set up a kexecfs for each VM when it is created, and give the same kexecfs back to each VM after kexec. Then VMs are free to save/restore any FDs in that kexecfs without cross-process communication or transferring file descriptors. Policy can be enforced by controlling access to kexecfs mounts. This naturally fits into the standard architecture of running untrusted VMs (e.g. using chroots and containers to enforce security and isolation). > > > > > I'm not sure that we can get rid of the machine-wide state machine > > though, as there is some kernel state that will necessarily cross > > these kexecfs domains (e.g. IOMMU driver state). So we still might > > need /dev/liveupdate for that. > > Generally speaking, I think both VFS-based and IOCTL-based interfaces > are more or less equally expressive/powerful. Most of the ioctl > operations can be translated to a VFS operation and vice versa. > > For example, the fsopen() call is similar to open("/dev/liveupdate") -- > both would create a live update session which auto closes when the FD is > closed or FS unmounted. Similarly, each ioctl can be replaced with a > file in the FS. For example, LIVEUPDATE_IOCTL_FD_PRESERVE can be > replaced with a fd_preserve file where you write() the FD number. > LIVEUPDATE_IOCTL_GET_STATE or LIVEUPDATE_IOCTL_PREPARE, etc. can be > replaced by a "state" file where you can read() or write() the state. > > I think the main benefit of the VFS-based interface is ease of use. > There already exist a bunch of utilites and libraries that we can use to > interact with files. When we have ioctls, we would need to write > everything ourselves. For example, instead of > LIVEUPDATE_IOCTL_GET_STATE, you can do "cat state", which is a bit > easier to do. > > As for downsides, I think we might end up with a bit more boilerplate > code, but beyond that I am not sure. I agree we can more or less get to the same end state with either approach. And also, I don't think we have to do one or the other. I think kexecfs is something that we can build on top of this series. For example, kexecfs would be a new kernel subsystem that registers with LUO.