From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D538FCD98D2 for ; Tue, 16 Jun 2026 04:37:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wZLXK-0001nT-Fr; Tue, 16 Jun 2026 00:36:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wZLXI-0001nC-J8 for qemu-arm@nongnu.org; Tue, 16 Jun 2026 00:36:28 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wZLXG-0005GH-DH for qemu-arm@nongnu.org; Tue, 16 Jun 2026 00:36:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1781584584; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Oa35p+zVQkCpDU+MCOXFeys+VK2aUOfD6XhKQQQ9Wk8=; b=GC/5pryFUFdCEFqIJw4eroFAXIF2dgrX7pS09c3aQA4fx8a2+XMggM87L5GAWmbBuLI/rK 4vYo3thT8v2CxG9r6mXuOS6qF3sGGXLBFbO0cZP+wD1cBfhZ2VGi0Nva4/Ip4ELmmSYLN7 CADM5PWo8htkgSH1dvNNtfOuRNRO8ZY= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-685-OlL6l_PAMhObJDSHJT5CIA-1; Tue, 16 Jun 2026 00:36:23 -0400 X-MC-Unique: OlL6l_PAMhObJDSHJT5CIA-1 X-Mimecast-MFC-AGG-ID: OlL6l_PAMhObJDSHJT5CIA_1781584582 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-490ae016853so29762285e9.2 for ; Mon, 15 Jun 2026 21:36:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781584582; x=1782189382; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Oa35p+zVQkCpDU+MCOXFeys+VK2aUOfD6XhKQQQ9Wk8=; b=BgeS8is2tQyP27ac94Xd8gkhv1Sl067tWF05dA1U9OFWkveO2bOFFaKFjqhY5SbC5h S8MV9YxiPjQHFWXLLfqViYZjE+sXPCcuX7sTm0UfHX669RWzx1Fc2XcnWn9i8V4zIS3C LqLeJZfjktane28/bz2UL8qZ5cUIJWirM5ztdk+MJjbU1EdSciUqRSwtH55aDPmqBkM4 0spAQOfRg35cqjZN1PVxsR/oiLGsPBFAV1zYTms50lD0/+O5Nbe8ds1/rCQDl6WlC7bx bU76r6onppOX1D/eIzFEDSqKIy0Y/wZyak3DtI1Tm91S4PDr/D0BJicT/2k5b426I8vC ZZ0A== X-Forwarded-Encrypted: i=1; AFNElJ+O8pZGLDl4glMLt0o3FqIRAhFzZmSo6we6qWHuOerVdLNVgd9IYUUW9h0Xfi/t9hqXAUd8QEgAxQ==@nongnu.org X-Gm-Message-State: AOJu0YwVQ60sjtNw4d7mw43JUP08Tidx3xR6qhNj3MjAGCEij8nEAZBg neaX1ny/JkTuI/xjSaSwzp3UE8anyLvN57mSeot+VqdSuUqwoV8E/ly9TjY9JFufJq2Z84G4gTm rwfaG5kSkMyXxgxE0DgDxBympZ6vpIR17m9g9qOJ3qvsuB3TOwxxGeg== X-Gm-Gg: Acq92OFnQ/ZnY3bYX3GknWnkUWvQ4funTWN/m9qeThxBOBETzJRDKefy6MOYxwRjRS0 KRbJu4yVLkT/xbfAKMMap+NKoEOY5441vHgHqXUZRzakFWtxyeDkX6cvw0DB7W+y4UXo5HUMlks x4dtnqD2sji4gzQJmZ8tuwL3wpp6oRj3bJhN8QICkEnmys1AMTtEG9BbNxkJV3VI68yzMoG35Um geSharGuXtOR8Uk38UjSXX9CXIaflydNO+OwYvgAzrrGBk7CBTWq44Z5jg/TE6+df9SJiRoaM3L blWGw21CB4greGtTDBCgoGT13f6x4yzow6VNjq8GdEkTR/dZOogQaGVYEIDtVAKqyzY9+5yLrBM KPMqbV8eC0OYF47SuN/k0UV5rXxpCjas2w+r+dHDQYlw= X-Received: by 2002:a05:600c:1591:b0:490:e281:287d with SMTP id 5b1f17b1804b1-490ec4c0ceemr144346665e9.1.1781584581733; Mon, 15 Jun 2026 21:36:21 -0700 (PDT) X-Received: by 2002:a05:600c:1591:b0:490:e281:287d with SMTP id 5b1f17b1804b1-490ec4c0ceemr144346435e9.1.1781584581140; Mon, 15 Jun 2026 21:36:21 -0700 (PDT) Received: from redhat.com (IGLD-80-230-85-71.inter.net.il. [80.230.85.71]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4922fa96f0esm47190165e9.12.2026.06.15.21.36.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jun 2026 21:36:20 -0700 (PDT) Date: Tue, 16 Jun 2026 00:36:16 -0400 From: "Michael S. Tsirkin" To: Gavin Shan Cc: Peter Maydell , qemu-arm@nongnu.org, qemu-devel@nongnu.org, peterx@redhat.com, alex@shazbot.org, richard.henderson@linaro.org, berrange@redhat.com, philmd@oss.qualcomm.com, philmd@mailo.com, david@kernel.org, clg@redhat.com, pbonzini@redhat.com, phrdina@redhat.com, jugraham@redhat.com, liugang24219@sangfor.com.cn, dinghui@sangfor.com.cn, shan.gavin@gmail.com Subject: Re: [PATCH v2 1/2] system/memory: Use qemu_ram_{copy, move}() in ram device region accessors Message-ID: <20260616002413-mutt-send-email-mst@kernel.org> References: <20260615100200.266968-1-gshan@redhat.com> <20260615100200.266968-2-gshan@redhat.com> <20260615105556-mutt-send-email-mst@kernel.org> <20260615154054-mutt-send-email-mst@kernel.org> <22d55870-b521-4002-add7-791bce8ea96a@redhat.com> <2535e092-b3ef-427c-9d33-39861b3a43f1@redhat.com> MIME-Version: 1.0 In-Reply-To: <2535e092-b3ef-427c-9d33-39861b3a43f1@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: AUH9TgRX06_sGjtvJ1JLpsvZI_OgsWYhOjO1KfE8HTM_1781584582 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.129.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.445, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-arm@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-arm-bounces+qemu-arm=archiver.kernel.org@nongnu.org Sender: qemu-arm-bounces+qemu-arm=archiver.kernel.org@nongnu.org On Tue, Jun 16, 2026 at 02:22:18PM +1000, Gavin Shan wrote: > On 6/16/26 7:31 AM, Gavin Shan wrote: > > On 6/16/26 5:42 AM, Michael S. Tsirkin wrote: > > > On Tue, Jun 16, 2026 at 05:24:15AM +1000, Gavin Shan wrote: > > > > Hi Peter and Michael, > > > > > > > > On 6/16/26 1:12 AM, Peter Maydell wrote: > > > > > On Mon, 15 Jun 2026 at 15:56, Michael S. Tsirkin wrote: > > > > > > > > > > > > On Mon, Jun 15, 2026 at 11:57:09AM +0100, Peter Maydell wrote: > > > > > > > On Mon, 15 Jun 2026 at 11:03, Gavin Shan wrote: > > > > > > > > > > > > > > > > All ram device regions were turned to be indirectly accessible by commit > > > > > > > > 4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads > > > > > > > > to guest hang on attempt to build 'cuda-samples' as reported by Julia. The > > > > > > > > guest is started by the following command lines, with GH100 GPU card passed > > > > > > > > from the host. > > > > > > > > > > > > > > > diff --git a/include/system/memory.h b/include/system/memory.h > > > > > > > > index 1417132f6d..5878727d09 100644 > > > > > > > > --- a/include/system/memory.h > > > > > > > > +++ b/include/system/memory.h > > > > > > > > @@ -2897,6 +2897,8 @@ void address_space_register_map_client(AddressSpace *as, QEMUBH *bh); > > > > > > > >    void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh); > > > > > > > > > > > > > > > >    /* Internal functions, part of the implementation of address_space_read.  */ > > > > > > > > +void qemu_ram_copy(void *dest, const void *src, size_t n); > > > > > > > > +void qemu_ram_move(void *dest, const void *src, size_t n); > > > > > > > > > > > > > > New function prototypes in include headers need documentation comments. > > > > > > > > > > > > > > In particular for these, it's really important that we clearly say > > > > > > > what semantics we are attempting to provide with them, so that > > > > > > > (a) when we're reviewing or later updating the implementation we > > > > > > > know what we are trying to provide > > > > > > > (b) when we're looking at the callsites we know what the function > > > > > > > is guaranteeing to us and what it is not, and thus whether it's > > > > > > > OK to use it or we need something els. > > > > > > > > > > > > > > > +#if defined(__i386__) || defined(__x86_64__) > > > > > > > > +#define HOST_UNALIGNED_MMIO_OK 1 > > > > > > > > +#else > > > > > > > > +#define HOST_UNALIGNED_MMIO_OK 0 > > > > > > > > +#endif > > > > > > > > > > > > > > We need to do something better than this. We can't > > > > > > > just say "oh, we trust that on x86 this works": it is > > > > > > > neither actually true that the compiler guarantees it > > > > > > > > > > > > sorry guarantee what? > > > > > > > > > > Well, that's part of my point about "we need a doc comment": > > > > > what exactly are we trying to guarantee ? But whatever it is, > > > > > "hardcoded ifdef that privileges x86" is definitely the wrong > > > > > answer. > > > > > > > > > > > > > Could you please check the following comments (documentation context) for the > > > > added functions look good to you? Please let me know if there are anything > > > > can be improved. > > > > > > > > + > > > > +/** > > > > + * qemu_ram_copy: copy data to a RAMBlock > > > > + * > > > > + * @dest: destination where the data is copied to. It's the pointer returned > > > > + *        by ramblock_ptr() and its variants. > > > > + * @src: source where the data is copied from. It can be a regular memory block > > > > + *       or a pointer returned by ramblock_ptr() and its variants. > > > > + * @n: length of data to be copied > > > > + * > > > > + * The destination is always a pointer to data area of a RAMBlock which can > > > > + * be for a regular memory block or a MMIO region. The source can be a pointer > > > > + * to a regular memory block or a MMIO region. Atomicity isn't guranteed by > > > > + * memcpy() to copy data between those MMIO regions as we do in this function. > > > > + * Besides, unaligned accesses are well handled on all architectures except > > > > + * i386 and x86_64 where the unaligned accesses are supported by the CPUs. > > > > + * > > > > + * The ensured atomicity and alignment consideration, which aren't needed > > > > + * by data copy between two regular memory blocks. So performance penalty > > > > + * is introduced by this function in such circumstance. > > > > + */ > > > > +void qemu_ram_copy(void *dest, const void *src, size_t n); > > > > + > > > > +/** > > > > + * qemu_ram_move: move data to a RAMBlock > > > > + * > > > > + * @dest: destination where the data is moved to. It's the pointer returned > > > > + *        by ramblock_ptr() and its variants. > > > > + * @src: source where the data is moved from. It can be a regular memory block > > > > + *       or a pointer returned by ramblock_ptr() and its variants. > > > > + * @n: length of data to be moved > > > > + * > > > > + * The destination is always a pointer to data area of a RAMBlock which can > > > > + * be for a regular memory block or a MMIO region. The source can be a pointer > > > > + * to a regular memory block or a MMIO region. Atomicity isn't guranteed by > > > > + * memmove() to move data from or to those MMIO regions as we do in this > > > > + * function. Besides, unaligned accesses are well handled on all architectures > > > > + * except i386 and x86_64 where the unaligned accesses are supported by the > > > > + * CPUs. > > > > + * > > > > + * The ensured atomicity and alignment consideration, which aren't needed > > > > + * by data movement between two regular memory blocks. So performance penalty > > > > + * is introduced by this function in such circumstance. > > > > + */ > > > > +void qemu_ram_move(void *dest, const void *src, size_t n); > > > > + > > > > > > > > > I apologize, I don't understand what these comments are saying, at all. > > > > > > > Sorry to hear that, Michael. There are two things done in the newly added > > function qemu_ram_{copy, move}(), comparing to the original memcpy() and > > memmove(): > > > > - Data copy or movement on MMIO region(s) using __bultin_{memcopy, memmove}() > >   on x86, or qatomic_set() on other architecutures, to avoid the unexpected > >   optimization done by the compiler on memcpy/memmove. Unsafe instructions > >   can be produced by the compiler's optimization. This fixes the issue resolved > >   by commit 4a2e242bbb ("memory: Don't use memcpy for ram_device regions") > > > > - The unaligned access is handled by forcing the access size aligned to (src | dest | n) > >   on non-x86 architectures, comparing to the original memcpy() and memmove(). > > > > Please let me know if we're on same page. If we do, I can revise the comments > > accordingly. > > > > I've modified the comments to something as below. Please let me know if it's > better. The point is to emphasize qemu_ram_{copy, memmove}() are friendly to > IO regions :-) > > -----> include/system/memory.h > > +/** > + * qemu_ram_copy: copy data to a guest memory block > + * what is a memory block? ramblock? don't we need something like this for reads? > + * @dest: destination where the data is copied to. A pointer to a guest memory @dst? > + * block returned from ramblock_ptr() or its variants > + * @src: source where the data is copied from. A pointer to host memory block > + * or guest memory block returned from ramblock_ptr() or its variants. > + * @n: length of data to be copied > + * > + * When the source pointer, destinatoin pointer or both dereference guest > + * memory blocks, which can be IO blocks (regions). IO block? let's not invent terminology pls. > Under this cirtumstance, > + * data copy between those blocks is equivalent to IO accesses. It's unsafe > + * to use memcpy(), which could be translated to IO unfriendly instructions > + * by the compiler due to code level optimization, resulting in the unexpected > + * behavior. This function is ensured to be friendly to both IO and memory > + * accesses. No optimized instructions should be generated by the compiler, > + * and no unexpected behavior should be observed for IO accesses. Let's be specific, and brief, and tell user what this does: Maybe: Copy @n bytes from @src to @dst. Assumes @src and @dst do not overlap. Handles special cases such as uncacheable ramblocks correctly. Use this for accessing ramblock in response to DMA/VCPU IO and in preference to memcpy. > + */ > +void qemu_ram_copy(void *dest, const void *src, size_t n); > + > +/** > + * qemu_ram_move: move data to a guest memory block > + * > + * @dest: destination where the data is moved to. A pointer to a guest memory > + * block returned from ramblock_ptr() or its variants > + * @src: source where the data is moved from. A pointer to host memory block > + * or guest memory block returned from ramblock_ptr() or its variants. > + * @n: length of data to be copied > + * > + * Similar to qemu_ram_copy(), this function is ensured to be friendly to both > + * IO and memory accesses. No optimized instructions should be generated by > + * the compiler, and no unexpected behavior should be observed for IO accesses. unclear how it is different from copy. > + */ > +void qemu_ram_move(void *dest, const void *src, size_t n); > > > > > > > > > > > > -- PMM > > > > > > > Thanks, > Gavin