From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753737AbcAVNk7 (ORCPT ); Fri, 22 Jan 2016 08:40:59 -0500 Received: from mga03.intel.com ([134.134.136.65]:12142 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753405AbcAVNky (ORCPT ); Fri, 22 Jan 2016 08:40:54 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,331,1449561600"; d="scan'208";a="866176051" Date: Fri, 22 Jan 2016 08:40:12 -0500 From: Matthew Wilcox To: linux-kernel@vger.kernel.org, Konstantin Khlebnikov , Hugh Dickins , Andrew Morton Subject: Bug in radix tree gang lookup? Message-ID: <20160122134012.GC2948@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I think there's a race in radix_tree_gang_lookup() (and related functions). I was trying to understand why we need the 'indirect_to_ptr()' call here: radix_tree_for_each_slot(slot, root, &iter, first_index) { results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); if (!results[ret]) continue; if (++ret == max_items) break; } The slots returned are supposed to be leaf nodes, so why would they ever have the indirect bit set? The only two cases I can think of where we'd see a slot with the indirect bit set is if we're calling radix_tree_gang_lookup() under the RCU read lock and simultaneously growing / shrinking the tree. When the tree transitions from height 0 to height 1, the 'slot' that was returned is now an internal pointer, so simply knocking off the 'indirect_to_ptr()' bit is the wrong thing to do; instead of returning a struct page pointer, we return a pointer to a radix_tree_node, which isn't good. When shrinking the tree from height 1 to height 0, we may end up looking at a pointer in to-be-freed memory, but it's still a valid pointer to a struct page, so I think we're OK in the shrink case. The lockless page cache shows how to handle this correctly; when we see an indirect bit come back in a slot, we should retry the lookup. I think that's the right thing to do in this case, but I'd like someone to check my reasoning before I propose a patch.