From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750833AbVH2NQu (ORCPT ); Mon, 29 Aug 2005 09:16:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750931AbVH2NQu (ORCPT ); Mon, 29 Aug 2005 09:16:50 -0400 Received: from magic.adaptec.com ([216.52.22.17]:49127 "EHLO magic.adaptec.com") by vger.kernel.org with ESMTP id S1750833AbVH2NQu (ORCPT ); Mon, 29 Aug 2005 09:16:50 -0400 Message-ID: <43130AB9.3080805@adaptec.com> Date: Mon, 29 Aug 2005 09:16:41 -0400 From: Luben Tuikov User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Trond Myklebust CC: Nick Piggin , James Bottomley , Andrew Morton , Linux Kernel Subject: Re: [PATCH] make radix tree gang lookup faster by using a bitmap search References: <1125159996.5159.8.camel@mulgrave> <20050827105355.360bd26a.akpm@osdl.org> <1125276312.5048.22.camel@mulgrave> <20050828175233.61cada23.akpm@osdl.org> <1125278389.5048.30.camel@mulgrave> <20050828183531.0b4d6f2d.akpm@osdl.org> <1125285994.5048.40.camel@mulgrave> <4312830C.8000308@yahoo.com.au> <1125287651.29493.3.camel@lade.trondhjem.org> In-Reply-To: <1125287651.29493.3.camel@lade.trondhjem.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 29 Aug 2005 13:16:44.0336 (UTC) FILETIME=[E6E14B00:01C5AC9B] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On 08/28/05 23:54, Trond Myklebust wrote: > må den 29.08.2005 Klokka 13:37 (+1000) skreiv Nick Piggin: > >>James Bottomley wrote: >> >>>On Sun, 2005-08-28 at 18:35 -0700, Andrew Morton wrote: >> >>>>It does make the tree higher and hence will incur some more cache missing >>>>when descending the tree. >>> >>> >>>Actually, I don't think it does: the common user is the page tree. >>>Obviously, I've changed nothing on 64 bits, so we only need to consider >>>what I've done on 32 bits. A page size is almost universally 4k on 32 >>>bit, so we need 20 bits to store the page tree index. Regardless of >>>whether the index size is 5 or 6, that gives a radix tree depth of 4. >>> >> >>s/common/only ? > > > grep again... I use it in the NFS client for bookkeeping requests. Hey guys, Take the posix-timers for example: idr_remove() is called with spin_lock_irqsave()/spin_unlock_irqrestore(), in release_posix_timer(). _But_, idr_pre_get() is called without any locks (as it should) which calls alloc_layer() in the IDR code, grabbing a spinlock _internally_ (no IRQ). If you're in the middle of idr_pre_get() inside the internally locked section of alloc_layer(), _and_ you call idr_remove() at that same time and (try to) enter free_layer(), you have a deadlock. The reason no one has seen this is: _load_. It wasn't until I started doing mkfs simultaneously on 3 or more disks in a loop, that I saw this problem. I've a stack trace to show if anyone cares. I'm not sure why no one sees this. I'm also not sure why people are working on IDR when there is no _pending_ need to improve this code (other than the deadlock I pointed out), while SCSI Core has been in dire need of improvement for the last 5 years. If SCSI Core developed in the same scheme as IDR is now: "no need, but its cool so I'll write it and my buddies will include the patch", we would have a *T10 spec-ed out SCSI Core by now*.(1) Luben (1) I can hear it now... "But do we need it?" But do we need these IDR improvements when there is no bug report or oops or architectural need to change it? Other than the deadlock?