From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD8D1C43387 for ; Fri, 11 Jan 2019 07:23:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A810D20872 for ; Fri, 11 Jan 2019 07:23:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730336AbfAKHXS (ORCPT ); Fri, 11 Jan 2019 02:23:18 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:42898 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729499AbfAKHXS (ORCPT ); Fri, 11 Jan 2019 02:23:18 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.91 #2 (Red Hat Linux)) id 1ghr9q-0000eP-00; Fri, 11 Jan 2019 07:23:10 +0000 Date: Fri, 11 Jan 2019 07:23:09 +0000 From: Al Viro To: Zefan Li Cc: Tejun Heo , Linus Torvalds , linux-fsdevel@vger.kernel.org Subject: [heads-up] buggered refcounting logics in cgroup1_mount() Message-ID: <20190111072309.GT2217@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org There'd been a long string of kludges in that area, and unfortunately even the latest variant of solution is broken. You rely upon the trick with percpu_ref_reinit() being done only after cgroup_do_mount(). That prevents parallel mount picking cgroup_root of superblock in the middle of setting up, then losing CPU, original mount succeeding and umount marking our cgroup_root killed; then the second mount regains CPU and proceeds to set a new superblock. Umount of _that_ would try to kill cgroup_root one more time, with obvious nastiness. Delaying the moment when cgroup_root can be grabbed solves that, but it does nothing for another similar case: * cgroup1 is mounted and populated * umount doesn't kill cgroup_root since it's non-empty * mount attempt picks cgroup_root, tries to do kernfs_pin_sb() (which finds no superblocks) and successfully bumps the refcount. Then it drops cgroup_mutex and proceeds to lose CPU (preemption, blocking allocation in kernfs, whatever). * another mount attempt picks the same cgroup_root, again finds no superblock, bumps the refcount and proceeds to mount the sucker. Then, before the first one regains CPU, it manages to empty the tree and umount it, marking cgroup_root killed. * now the first attempt regains CPU and we are in the same situation as with the original bug. Another problem with that is that (without any races) failing allocation in d_make_root() within kernfs_fill_super() will end up in cgroup_kill_sb(), with cgroup_root _still_ marked killed (your kludge creates it that way and we hadn't reached the reinit yet). AFAICS, a cleaner solution would be this: * to hell with kernfs_pin_sb(); just try to grab a reference to cgroup_root on reuse. * have cgroup_kill_sb() treat "it's already marked killed" as "just drop the reference, then". * after cgroup_do_mount() check if cgroup_root got marked killed and do deactivate_locked_super() in such case (with the same restart_syscall() failure exit). Objections? I would love to kill kernfs_pin_sb() as a followup (it's a fundamentally broken API), but that's not a stable fodder; some fix of refcounting bugs, OTOH, should be.