From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38B77C46477 for ; Mon, 17 Jun 2019 21:37:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0C07A20657 for ; Mon, 17 Jun 2019 21:37:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560807458; bh=S4e/gtUZeusv/WpAX1lvOIB/VF9tw+8nMleUNsOV9Ko=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=zWFJmyDCzYNL8zh/KR9rJ+GW4wMu9ns9dZUjvf8wcQ/Zo1t6AkMmx98tLrr8pFuLu 5ElQuXYC1sAes8Bz8ha3bKjkbOxR08HBquzJa0/z8Daku4j4ChOkn7ov9CtBMplN8p dBNef51+dCnu2Uljp/7O5CkRMW1WfgIvHGAanWB4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728917AbfFQVTs (ORCPT ); Mon, 17 Jun 2019 17:19:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:43724 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728446AbfFQVTs (ORCPT ); Mon, 17 Jun 2019 17:19:48 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5963E2089E; Mon, 17 Jun 2019 21:19:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560806386; bh=S4e/gtUZeusv/WpAX1lvOIB/VF9tw+8nMleUNsOV9Ko=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=1K7ZBzhM8zxS140h+MuwfaaOV+b10kbsUAyRlfdThazI30d+Iqbz9/BSUAmYZXA0s j8K9yZIy6KeR5+MyPxmkV4KhvmxIo21KjhJnntSN+9NzO/4NHpxOvzzNm32ZDZ9mv5 DeNHNy2yfade5zEFp2pl1aWedS0GBrbuDydIwOPg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tejun Heo Subject: [PATCH 5.1 033/115] cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css() Date: Mon, 17 Jun 2019 23:08:53 +0200 Message-Id: <20190617210801.657189956@linuxfoundation.org> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190617210759.929316339@linuxfoundation.org> References: <20190617210759.929316339@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Tejun Heo commit 18fa84a2db0e15b02baa5d94bdb5bd509175d2f6 upstream. A PF_EXITING task can stay associated with an offline css. If such task calls task_get_css(), it can get stuck indefinitely. This can be triggered by BSD process accounting which writes to a file with PF_EXITING set when racing against memcg disable as in the backtrace at the end. After this change, task_get_css() may return a css which was already offline when the function was called. None of the existing users are affected by this change. INFO: rcu_sched self-detected stall on CPU INFO: rcu_sched detected stalls on CPUs/tasks: ... NMI backtrace for cpu 0 ... Call Trace: dump_stack+0x46/0x68 nmi_cpu_backtrace.cold.2+0x13/0x57 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x9e/0xce rcu_check_callbacks.cold.74+0x2af/0x433 update_process_times+0x28/0x60 tick_sched_timer+0x34/0x70 __hrtimer_run_queues+0xee/0x250 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x56/0x110 apic_timer_interrupt+0xf/0x20 RIP: 0010:balance_dirty_pages_ratelimited+0x28f/0x3d0 ... btrfs_file_write_iter+0x31b/0x563 __vfs_write+0xfa/0x140 __kernel_write+0x4f/0x100 do_acct_process+0x495/0x580 acct_process+0xb9/0xdb do_exit+0x748/0xa00 do_group_exit+0x3a/0xa0 get_signal+0x254/0x560 do_signal+0x23/0x5c0 exit_to_usermode_loop+0x5d/0xa0 prepare_exit_to_usermode+0x53/0x80 retint_user+0x8/0x8 Signed-off-by: Tejun Heo Cc: stable@vger.kernel.org # v4.2+ Fixes: ec438699a9ae ("cgroup, block: implement task_get_css() and use it in bio_associate_current()") Signed-off-by: Greg Kroah-Hartman --- include/linux/cgroup.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -487,7 +487,7 @@ static inline struct cgroup_subsys_state * * Find the css for the (@task, @subsys_id) combination, increment a * reference on and return it. This function is guaranteed to return a - * valid css. + * valid css. The returned css may already have been offlined. */ static inline struct cgroup_subsys_state * task_get_css(struct task_struct *task, int subsys_id) @@ -497,7 +497,13 @@ task_get_css(struct task_struct *task, i rcu_read_lock(); while (true) { css = task_css(task, subsys_id); - if (likely(css_tryget_online(css))) + /* + * Can't use css_tryget_online() here. A task which has + * PF_EXITING set may stay associated with an offline css. + * If such task calls this function, css_tryget_online() + * will keep failing. + */ + if (likely(css_tryget(css))) break; cpu_relax(); }