From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: [RFC] breakage in sysfs_readdir() and s_instances abuse in sysfs Date: Sat, 4 Jun 2011 01:15:19 +0100 Message-ID: <20110604001518.GT11521@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, Linus Torvalds To: "Eric W. Biederman" Return-path: Received: from zeniv.linux.org.uk ([195.92.253.2]:54615 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754148Ab1FDAP2 (ORCPT ); Fri, 3 Jun 2011 20:15:28 -0400 Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-ID: sysfs_readdir() starts with doing ns = sysfs_info(dentry->d_sb)->ns[type]; then proceeds to scan the directory with sysfs_dir_{,next_}pos(ns, ...) even though we have no promise whatsoever that sysfs_exit_ns() has not cleared that element of array in the meanwhile. Granted, sysfs_dir_pos() doesn't dereference ns; however, it does compare with it. And ns might have been freed and reused by that point. I don't like what's going on there; that code looks inherently racy. We never set ->ns[...] non-NULL outside of mount(). So it looks like the intended behaviour is to have all ns-specific entries of that type disappear forever from that fs instance. Having entries for _another_ namespace to show up for a (short) while, and that only in readdir() (lookup runs completely under sysfs_mutex, so we don't have that race there)... I also don't like the s_instances abuse in sysfs_exit_ns() but I'd like to understand what is that code *for* before starting to massage it into something saner. What are the possible locking conditions in callers of that thing? I've tried to look for callers, but the (too long) callchain has led me to pernet_operations ->exit() and at that point I'd given up - that over the top for grepping... BTW, is kobj_ns_current() pinned down? Can that ns disappear from under sysfs_mount()? It looks like it shouldn't, but then I haven't traced the callers of sysfs_exit_ns()... What I'm planning to do (for unrelated reasons - ubifs needs it) is to add an analog of iterate_supers() that would go over the superblocks of given type and call a function on them. I would like to convert sysfs_exit_ns() to it and kill the last abuser of s_instances (and one of the last sb_lock ones), but that really depends on what kind of locking is needed for readdir() and friends - as it is, the damn thing looks *wrong*.