From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76A1EC4332F for ; Tue, 31 Oct 2023 15:53:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qxr2u-0005rn-Fs; Tue, 31 Oct 2023 11:52:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qxr2t-0005rf-R3 for qemu-devel@nongnu.org; Tue, 31 Oct 2023 11:52:47 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qxr2r-0006GV-Ug for qemu-devel@nongnu.org; Tue, 31 Oct 2023 11:52:47 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 896F51F38C; Tue, 31 Oct 2023 15:52:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1698767564; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CUDLUcEqLVpAE/QcJScs3whR1e5zkq2QnY94NvassYY=; b=rMkmLHyFYEIc9oynWCSXovtM6bUkKv1sZyzbDCk73gXj0styo1X5OuyIiLGzmMyyvPqs+O 3BvT4krnOeZqXtQFSJNDcZKGwE6+Q/gJ/z9OcHmSNJ6+KLsxfkli0KxOujqLfs3PvyvxoJ lCPPKVgP4otRsDx/33+tj73eKl5vQnM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1698767564; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CUDLUcEqLVpAE/QcJScs3whR1e5zkq2QnY94NvassYY=; b=50FmbDzCiIU7/30GDaH5EfcQ5vOkxl9E+z2hCY/t5lkcYVjQQXsbM6HmeTPPmk/kgpQbWL DYK9vdWWbtAtFdAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0D52A138EF; Tue, 31 Oct 2023 15:52:43 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id zSGgMssiQWX4QAAAMHmgww (envelope-from ); Tue, 31 Oct 2023 15:52:43 +0000 From: Fabiano Rosas To: =?utf-8?Q?Daniel_P=2E_Berrang=C3=A9?= Cc: Markus Armbruster , qemu-devel@nongnu.org, Juan Quintela , Peter Xu , Leonardo Bras , Claudio Fontana , Eric Blake Subject: Re: [PATCH v2 28/29] migration: Add direct-io parameter In-Reply-To: References: <878r7svapt.fsf@pond.sub.org> <87msw7ddfp.fsf@suse.de> <87cyx2epsv.fsf@suse.de> <87cywvenbd.fsf@suse.de> <878r7jdjrf.fsf@suse.de> <875y2meua3.fsf@suse.de> Date: Tue, 31 Oct 2023 12:52:41 -0300 Message-ID: <8734xqeqly.fsf@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=195.135.220.29; envelope-from=farosas@suse.de; helo=smtp-out2.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Daniel P. Berrang=C3=A9 writes: > On Tue, Oct 31, 2023 at 11:33:24AM -0300, Fabiano Rosas wrote: >> Daniel P. Berrang=C3=A9 writes: >>=20 >> > On Tue, Oct 31, 2023 at 10:05:56AM -0300, Fabiano Rosas wrote: >> >> Daniel P. Berrang=C3=A9 writes: >> >>=20 >> >> > On Mon, Oct 30, 2023 at 07:51:34PM -0300, Fabiano Rosas wrote: >> >> >> I could use some advice on how to solve this situation. The fdset = code >> >> >> at monitor/fds.c and the add-fd command don't seem to be usable ou= tside >> >> >> the original use-case of passing fds with different open flags. >> >> >>=20 >> >> >> There are several problems, the biggest one being that there's no = way to >> >> >> manipulate the set of file descriptors aside from asking for dupli= cation >> >> >> of an fd that matches a particular set of flags. >> >> >>=20 >> >> >> That doesn't work for us because the two fds we need (one for main >> >> >> channel, other for secondary channels) will have the same open fla= gs. So >> >> >> the fdset code will always return the first one it finds in the se= t. >> >> > >> >> > QEMU may want multiple FDs *internally*, but IMHO that fact should >> >> > not be exposed to mgmt applications. It would be valid for a QEMU >> >> > impl to share the same FD across multiple threads, or have a differ= ent >> >> > FD for each thread. All threads are using pread/pwrite, so it is sa= fe >> >> > for them to use the same FD if they desire. It is a private impl ch= oice >> >> > for QEMU at any given point in time and could change over time. >> >> > >> >>=20 >> >> Sure, I don't disagree. However up until last week we had a seemingly >> >> usable "add-fd" command that allows the user to provide a *set of file >> >> descriptors* to QEMU. It's just now that we're learning that interface >> >> serves only a special use-case. >> > >> > AFAICT though we don't need add-fd to support passing many files >> > for our needs. Saving only requires a single FD. All others can >> > be opened by dup(), so the limitation of add-fd is irrelevant >> > surely ? >>=20 >> Only once we decide to use one FD. If we had a generic add-fd backend, >> then that's already a user-facing API, so the "implementation detail" >> argument becomes weaker. >>=20 >> With a single FD we'll need to be very careful about what code is >> allowed to run while the multifd channels are doing IO. Since O_DIRECT >> is not widely supported, now we have to also be careful about someone >> using that QEMUFile handle to do unaligned writes and not even noticing >> that it breaks direct IO. None of this in unworkable, of course, I just >> find the design way clearer with just the file name + offset. > > I guess I'm not seeing the problem still. A single FD is passed across > from libvirt, but QEMU is free to turn that into *many* FDs for its > internal use, using dup() and then setting O_DIRECT on as many/few of > the dup()d FDs as its wants to. The problem is that duplicated FDs share the file status flags. If we set O_DIRECT on the multifd channels and the main thread happens to do an unaligned write with qemu_file_put* then the filesystem will fail that write.