From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EA8CC7EE29 for ; Fri, 2 Jun 2023 18:00:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235709AbjFBSAV (ORCPT ); Fri, 2 Jun 2023 14:00:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236589AbjFBSAR (ORCPT ); Fri, 2 Jun 2023 14:00:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4728F1A2 for ; Fri, 2 Jun 2023 10:59:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685728767; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2vEfy4ABvbNm71f6wJfPO0YUTrGmZR9PK0H6cIJhbRw=; b=abSWuJysw7fwpk4xV/ySYfNpwdSuqRYLYAUcUa0q4pAfshcQROlA/mfbY43F11nYKg7EOP UciJLWjz4kIVRNgoOlVGGXcjJwIRSmAM/UAnAo6p9MCPmImlSUdbgY50DC0qoFpTVgol4o 2FNlNdisUno9tox0oerPLeWNh2bgsks= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-412-nTsWchWkPnyhyWDJbcXr9A-1; Fri, 02 Jun 2023 13:59:24 -0400 X-MC-Unique: nTsWchWkPnyhyWDJbcXr9A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BB2033C14116; Fri, 2 Jun 2023 17:59:23 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.224.50]) by smtp.corp.redhat.com (Postfix) with SMTP id 0193FC154D7; Fri, 2 Jun 2023 17:59:08 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Fri, 2 Jun 2023 19:59:02 +0200 (CEST) Date: Fri, 2 Jun 2023 19:58:47 +0200 From: Oleg Nesterov To: Jason Wang Cc: Mike Christie , linux@leemhuis.info, nicolas.dichtel@6wind.com, axboe@kernel.dk, ebiederm@xmission.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, sgarzare@redhat.com, stefanha@redhat.com, brauner@kernel.org Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression Message-ID: <20230602175846.GC555@redhat.com> References: <20230522174757.GC22159@redhat.com> <20230523121506.GA6562@redhat.com> <26c87be0-8e19-d677-a51b-e6821e6f7ae4@redhat.com> <20230531072449.GA25046@redhat.com> <20230531091432.GB25046@redhat.com> <20230601074315.GA13133@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/02, Jason Wang wrote: > > On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov wrote: > > > > and the final rewrite: > > > > if (work->node) { > > work_next = work->node->next; > > if (true) > > clear_bit(&work->flags); > > } > > > > so again, I do not see the load-store control dependency. > > This kind of optimization is suspicious. Especially considering it's > the control expression of the loop but not a condition. It is not about optimization, > Looking at the assembly (x86): > > 0xffffffff81d46c5b <+75>: callq 0xffffffff81689ac0 > 0xffffffff81d46c60 <+80>: mov %rax,%r15 > 0xffffffff81d46c63 <+83>: test %rax,%rax > 0xffffffff81d46c66 <+86>: je 0xffffffff81d46c3a > 0xffffffff81d46c68 <+88>: mov %r15,%rdi > 0xffffffff81d46c6b <+91>: mov (%r15),%r15 > 0xffffffff81d46c6e <+94>: lock andb $0xfd,0x10(%rdi) > 0xffffffff81d46c73 <+99>: movl $0x0,0x18(%rbx) > 0xffffffff81d46c7a <+106>: mov 0x8(%rdi),%rax > 0xffffffff81d46c7e <+110>: callq 0xffffffff821b39a0 > <__x86_indirect_thunk_array> > 0xffffffff81d46c83 <+115>: callq 0xffffffff821b4d10 <__SCT__cond_resched> > ... > > I can see: > > 1) The code read node->next (+91) before clear_bit (+94) The code does. but what about CPU ? > 2) And the it uses a lock prefix to guarantee the execution order As I said from the very beginning, this code is fine on x86 because atomic ops are fully serialised on x86. OK. we can't convince each other. I'll try to write another email when I have time, If this code is correct, then my understanding of memory barriers is even worse than I think. I wouldn't be surprised, but I'd like to understand what I have missed. Oleg.