From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF4DCC83F05 for ; Sun, 6 Jul 2025 14:33:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39D336B03F7; Sun, 6 Jul 2025 10:33:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 34E196B03F8; Sun, 6 Jul 2025 10:33:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 263E16B03F9; Sun, 6 Jul 2025 10:33:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0FBDD6B03F7 for ; Sun, 6 Jul 2025 10:33:30 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 85E838010C for ; Sun, 6 Jul 2025 14:33:29 +0000 (UTC) X-FDA: 83634082938.21.09C3026 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf16.hostedemail.com (Postfix) with ESMTP id E4095180003 for ; Sun, 6 Jul 2025 14:33:27 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=iKDZv9Sc; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751812407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A0cBVADMH++YA1pvlJd5YCm+hrxgnBlgbTg1WONQWJY=; b=5od89j6AUa9Z3g/TPrxOfQdfe626nNc+0+RnivRCmR3hBPAcHe9l3Y1Rm1sL4WEYvRL6IL SwSlrBh5jm13bE8OZMlrnk8/Hw06a2dbFQKKFuQzdLZsEuv0uYKZ+IKz0baPIGVWxyC52P ORzQQPMEOd0fxXNnybnNm/2Q4SFx43s= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=iKDZv9Sc; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751812407; a=rsa-sha256; cv=none; b=IkD+eoW9NI8uCjAvWyHkpIUJZkESo3VPUmaT2GiclEE11tw4wBcTWvEBuqrOd3neqF2K6y PYbNMqOI+PiSRqkj/kXqC3UWAun05BMOIFDLw3tXm8aKSbzEJG6XqHEpPLhq+OWJ6qfsq+ qm5gi5iCAN2JV4OE2GuN8480iQ8IFM0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id AA00160053; Sun, 6 Jul 2025 14:33:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C03B9C4CEED; Sun, 6 Jul 2025 14:33:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751812406; bh=GszBMgPIL9mHFHPRYSHqyi7DgkOJi1d97/cdu9eFZn0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=iKDZv9ScfdI1Wi/lRv/5uIhMUsTaVhCqG+RoWcCYdqxf9xL960iv2mlayyM36bdyq bK1IzAqY+rLCanSz7v2blOuZt5Eimi1f+D+N1/rjRXIuFf62sMm+E5adkO7ZQ36PIW w7T+wS6JoyMB/IbwaGgh066RCz7eD1SDrbkfHvmHqH00k1mgNECBzVvSqByfMVohoJ H4uxNLzCkWNdTlHpnaoSkVdlb0q7H6kUJAjh7XwAzYnQ5u08XFkZEPc1cA0qtGKNrO OitG3C6wjnulFelbr3s8EEXguXFYMfekQGa25vslEaj/n9+P5d+gtME+RN0LDeUko8 hUi2T3QWkkfeg== Date: Sun, 6 Jul 2025 17:33:04 +0300 From: Mike Rapoport To: Pratyush Yadav Cc: David Matlack , Christian Brauner , Pasha Tatashin , jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com Subject: Re: [RFC v2 10/16] luo: luo_ioctl: add ioctl interface Message-ID: References: <20250515182322.117840-1-pasha.tatashin@soleen.com> <20250515182322.117840-11-pasha.tatashin@soleen.com> <20250624-akzeptabel-angreifbar-9095f4717ca4@brauner> <20250625-akrobatisch-libellen-352997eb08ef@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Stat-Signature: r1hf4zuwmuu15h8m8dhkqz96cjba9p98 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E4095180003 X-HE-Tag: 1751812407-384615 X-HE-Meta: U2FsdGVkX18fst62kB1GKt153HgLQ6Zn1JBpONVjUVowC7J8dRyixje0K1fero4SLSxBUKw7PuzmdJv2nIgfByHV/+wM+QtDgYTlzCKuAko/e19GaqoPfLAPcQYMUkbcQhC3e/WFS3RZn+p1tSJpJDbfhFnWaevkwlKjuPyQChGGHeoUEVNlQHhU8MyJZZoiFbBiCYipuu2ZRzAFUPtq2VtrviQ4wvYbGF/gjyeygJg27dqO1eQjhJ39y0yfqexXVJNOPK/o9aU04dc+S20jAVGAiYjJSOqVdzkNFuwZTGWnanpfyIO5Nl7fn3ToaCqjGd8DdJ2KawNJbazpPdlg5XVsZI7xFW5Kgicr01gzIpZCfY3ifcKg1YJkuF2OSfRnWzVGXMvnFYr6kfy/dVYEgQN/4ejXRT/duLkHT8//j+pCOn4gZ308YbFDpenAbTKKq3axwoMU+is6xYpjW01icbd8GV12mWacIwOnIQTN+yhB4ffLTRo9JjgK8DUDL+C5fx89tYPjxteoVJrV+hIpEvPWYfTYHrOatf9s6TSknbYkXzHBDGohLrGsBoqv2BB4as9YqZJ5eDQW4gOJx/Pw2GynT9jjB07fMMds2vIu3BaLRWgIwcGB6NTsXh1ffJtrxB5K6EwNoWRiSx+oxuVOxToJ7dsraAdwUPmRvuWLdXW4akQqgNfEKmjXh2R88W6JioRdhe8ozdD8IKnMQpqi9ddwiKhQkdvvSTghz+idXpYtyPmjLwBeXdAv+7FepYIRgqDTstlfi4/RhGFgFqJdgBdslS6Gnzglhz0x8Yt2CMGzy9pux22V8UQD1qCNGuVyS8okzxQJPXxf9wvFysR1sXzKeOxyjWUuKnPV5xHpQ1abGsiwPviySzcb9UiY87OmmM0nYN1iKYSWREHM7hiExIUELTx+0Yd9JBaWYchiToPaH2GtrOjw+Q8DMrMVX/VLjiliOlOgDWxomq/F4Zh 14UcSHaz reGKCT9OTRrBHmz8MPZ2FVa3IwLkQeGgCm5WIdOJaADROLC9MJTLw70vNlItQkGo7hEmpqXp+QlqpOaHalQqqyJhKNFXBgwumqcT044DnMRtbgfdXI7K3CAiVMEjz3PwsQBFV0zwcJzMD7TMovF4617GDkMxWJ4gDHS4BIo4XdEu0pQjwllyVHjjSnPCY8cCpr2rqQbPBCqUoZtl1GE1sPRRXwbommOv8anYBwjc7Jo89SP8KkgTO7OsL6rJsZNXsHFKVwzKBnDGvLfdfO1YKJb6EqVw2dSoQPwoR/s2/+VOSEixozQ8I5i+YB9Qq5HJA8JCY/Bn7ibd04yhAOKWrGxq8TVHKbv0vkxQnOLGOCzyiCV4/2O8vJOt3QhTwwuB3tj9S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 26, 2025 at 05:42:28PM +0200, Pratyush Yadav wrote: > On Wed, Jun 25 2025, David Matlack wrote: > > > On Wed, Jun 25, 2025 at 2:36 AM Christian Brauner wrote: > >> > > >> > While I agree that a filesystem offers superior introspection and > >> > integration with standard tools, building this complex, stateful > >> > orchestration logic on top of VFS seemed to be forcing a square peg > >> > into a round hole. The ioctl interface, while more opaque, provides a > >> > direct and explicit way to command the state machine and manage these > >> > complex lifecycle and dependency rules. > >> > >> I'm not going to argue that you have to switch to this kexecfs idea > >> but... > >> > >> You're using a character device that's tied to devmptfs. In other words, > >> you're already using a filesystem interface. Literally the whole code > >> here is built on top of filesystem APIs. So this argument is just very > >> wrong imho. If you can built it on top of a character device using VFS > >> interfaces you can do it as a minimal filesystem. > >> > >> You're free to define the filesystem interface any way you like it. We > >> have a ton of examples there. All your ioctls would just be tied to the > >> fileystem instance instead of the /dev/somethingsomething character > >> device. The state machine could just be implemented the same way. > >> > >> One of my points is that with an fs interface you can have easy state > >> seralization on a per-service level. IOW, you have a bunch of virtual > >> machines running as services or some networking services or whatever. > >> You could just bind-mount an instance of kexecfs into the service and > >> the service can persist state into the instance and easily recover it > >> after kexec. > > > > This approach sounds worth exploring more. It would avoid the need for > > a centralized daemon to mediate the preservation and restoration of > > all file descriptors. > > One of the jobs of the centralized daemon is to decide the _policy_ of > who gets to preserve things and more importantly, make sure the right > party unpreserves the right FDs after a kexec. I don't see how this > interface fixes this problem. You would still need a way to identify > which kexecfs instance belongs to who and enforce that. The kernel > probably shouldn't be the one doing this kind of policy so you still > need some userspace component to make those decisions. > > > > > I'm not sure that we can get rid of the machine-wide state machine > > though, as there is some kernel state that will necessarily cross > > these kexecfs domains (e.g. IOMMU driver state). So we still might > > need /dev/liveupdate for that. > > Generally speaking, I think both VFS-based and IOCTL-based interfaces > are more or less equally expressive/powerful. Most of the ioctl > operations can be translated to a VFS operation and vice versa. > > For example, the fsopen() call is similar to open("/dev/liveupdate") -- > both would create a live update session which auto closes when the FD is > closed or FS unmounted. Similarly, each ioctl can be replaced with a > file in the FS. For example, LIVEUPDATE_IOCTL_FD_PRESERVE can be > replaced with a fd_preserve file where you write() the FD number. > LIVEUPDATE_IOCTL_GET_STATE or LIVEUPDATE_IOCTL_PREPARE, etc. can be > replaced by a "state" file where you can read() or write() the state. > > I think the main benefit of the VFS-based interface is ease of use. > There already exist a bunch of utilites and libraries that we can use to > interact with files. When we have ioctls, we would need to write > everything ourselves. For example, instead of > LIVEUPDATE_IOCTL_GET_STATE, you can do "cat state", which is a bit > easier to do. > > As for downsides, I think we might end up with a bit more boilerplate > code, but beyond that I am not sure. One of the points in Christian's suggestion was that ioctl doesn't have to be bound to a misc device. Even if we don't use read()/write()/link() etc, we can have a filesystem that exposes, say, "control" file and that file has the same liveupdate_ioctl() in its fops as we have now in miscdev. The cost is indeed a bit of boilerplate code to create the filesystem, but it would be easier to extend for per-service and containers support. And we won't need sysfs entry for status, as it can be also pre-populated in kexecfs (or whatever it'll be called). > -- > Regards, > Pratyush Yadav -- Sincerely yours, Mike.