From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E4F911703 for ; Mon, 21 Aug 2023 19:55:58 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01BA6FD for ; Mon, 21 Aug 2023 12:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1692647750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=QPYOMRdHcYATwgrXosUBFPNlaUmkxMaSlYev3xvCIrA=; b=cuse/6I2mR1sLVcTwnj2DgLr0SlM/zc8eIGv9OgLqw9IFrCV0XypWkYH+kt0+apwzvMcfo Uu+v2fyF3fePumPDzTfY2TIY2SfGDFv2lH469AYfbWZRcKXMYl1FJqJFdcpvWRKzPbrmZZ 1fWosZYXPaSYkFyWbEjewULWbByl4CY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-644-AaVMZCREN0WGkhdjd1q0Wg-1; Mon, 21 Aug 2023 15:55:48 -0400 X-MC-Unique: AaVMZCREN0WGkhdjd1q0Wg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E01548D40AA; Mon, 21 Aug 2023 19:55:46 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.224.99]) by smtp.corp.redhat.com (Postfix) with SMTP id D06FA63F6C; Mon, 21 Aug 2023 19:55:44 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Mon, 21 Aug 2023 21:55:01 +0200 (CEST) Date: Mon, 21 Aug 2023 21:54:58 +0200 From: Oleg Nesterov To: Kui-Feng Lee Cc: Yonghong Song , Kui-Feng Lee , Andrii Nakryiko , "Eric W. Biederman" , Martin KaFai Lau , bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] bpf: task_group_seq_get_next: cleanup the usage of next_thread() Message-ID: <20230821195458.GC12526@redhat.com> References: <20230821150909.GA2431@redhat.com> <20230821183443.GA12526@redhat.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230821183443.GA12526@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net So I still think the pid_alive() check should die... and when I look at this code again I don't understand why does it abuse task_struct->usage, I'll send another patch on top of this one. On 08/21, Oleg Nesterov wrote: > > On 08/21, Kui-Feng Lee wrote: > > > > > > On 8/21/23 08:09, Oleg Nesterov wrote: > > >1. find_pid_ns() + get_pid_task() under rcu_read_lock() guarantees that we > > > can safely iterate the task->thread_group list. Even if this task exits > > > right after get_pid_task() (or goto retry) and pid_alive() returns 0 > > > > Kill the unnecessary pid_alive() check. > > > > This function will return next_task holding a refcount, and release the > > refcount until the next time calling the same function. Meanwhile, > > the returned task A may be killed, and its next task B may be > > killed after A as well, before calling this function again. > > However, even task B is destroyed (free), A's next is still pointing to > > task B. When this function is called again for the same iterator, > > it doesn't promise that B is still there. > > Not sure I understand... > > OK, if we have a task pointer with incremented refcount and do not hold > rcu lock, then yes, you can't remove the pid_alive() check in this code: > > rcu_read_lock(); > if (pid_alive(task)) > do_something(next_thread(task)); > rcu_read_unlock(); > > because task and then task->next can exit and do call_rcu(delayed_put_task_struct) > before we take rcu_read_lock(). > > But if you do something like > > rcu_read_lock(); > > task = find_task_in_some_rcu_protected_list(); > do_something(next_thread(task)); > > rcu_read_unlock(); > > then next_thread(task) should be safe without pid_alive(). > > And iiuc task_group_seq_get_next() always does > > rcu_read_lock(); // the caller does lock/unlock > > task = get_pid_task(pid, PIDTYPE_PID); > if (!task) > return; > > next_task = next_thread(task); > > rcu_read_unlock(); > > Yes, both task and task->next can exit right after get_pid_task(), but since > can only happen after we took rcu_read_lock(), delayed_put_task_struct() can't > be called until we drop rcu lock. > > What have I missed? > > Oleg.