From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752904Ab0ESOjP (ORCPT ); Wed, 19 May 2010 10:39:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15827 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752860Ab0ESOjN (ORCPT ); Wed, 19 May 2010 10:39:13 -0400 Date: Wed, 19 May 2010 10:39:00 -0400 From: Mike Snitzer To: Kiyoshi Ueda Cc: Nikanth Karthikesan , linux-kernel@vger.kernel.org, dm-devel@redhat.com, Alasdair Kergon , Jens Axboe , "Jun'ichi Nomura" , Vivek Goyal Subject: Re: [RFC PATCH 2/2] dm: only initialize full request_queue for request-based device Message-ID: <20100519143900.GC24618@redhat.com> References: <4BEA659F.9050206@ct.jp.nec.com> <20100513035750.GA25523@redhat.com> <4BED049C.5040409@ct.jp.nec.com> <20100514140852.GA10373@redhat.com> <4BF10BF1.3040108@ct.jp.nec.com> <20100517172737.GA24591@redhat.com> <4BF25091.3000507@ct.jp.nec.com> <20100518134639.GA27582@redhat.com> <4BF37DD5.9050409@ct.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-08-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 19 2010 at 8:01am -0400, Mike Snitzer wrote: > On Wed, May 19, 2010 at 1:57 AM, Kiyoshi Ueda wrote: > > Hi Mike, > > > > On 05/18/2010 10:46 PM +0900, Mike Snitzer wrote: > >> I'll post v5 of the overall patch which will revert the mapped_device > >> 'queue_lock' serialization that I proposed in v4.  v5 will contain > >> the following patch to localize all table load related queue > >> manipulation to the _hash_lock protected critical section in > >> table_load().  So it sets the queue up _after_ the table's type is > >> established with dm_table_set_type(). > > > > dm_table_setup_md_queue() may allocate memory with blocking mode. > > Blocking allocation inside exclusive _hash_lock can cause deadlock; > > e.g. when it has to wait for other dm devices to resume to free some > > memory. > > We make no guarantees that other DM devices will resume before a table > load -- so calling dm_table_setup_md_queue() within the exclusive > _hash_lock is no different than other DM devices being suspended while > a request-based DM device performs its first table_load(). > > My thinking was this should not be a problem as it is only valid to > call dm_table_setup_md_queue() before the newly created request-based > DM device has been resumed. > > AFAIK we don't have any explicit constraints on memory allocations > during table load (e.g. table loads shouldn't depend on other devices' > writeback) -- but any GFP_KERNEL allocation could recurse > (elevator_alloc() currently uses GFP_KERNEL with kmalloc_node)... > > I'll have to review the DM code further to see if all memory > allocations during table_load() are done via mempools. I'll also > bring this up on this week's LVM call. We discussed this and I understand the scope of the problem now. Just reiterating what you covered when you first pointed this issue out: It could be that a table load gets blocked (waiting on a memory allocation). The table load can take as long as it needs. But we can't have it block holding the exclusive _hash_lock while blocking. Having _hash_lock prevents further DM ioctls. The table load's allocation may be blocking waiting for writeback to a DM device that will be resumed by another thread. Thanks again for pointing this out; I'll work to arrive at an alternative locking scheme. Likely introduce a lock local to the multiple_device (effectively the 'queue_lock' I had before). But difference is I'd take that lock before taking _hash_lock. I hope to have v6 available at some point today but it may be delayed by a day. > > Also, your patch changes the queue configuration even when a table is > > already active and used.  (e.g. Loading bio-based table to a mapped_device > > which is already active/used as request-based sets q->requst_fn in NULL.) > > That could cause some critical problems. > > Yes, that is possible and I can add additional checks to prevent this. > But this speaks to a more general problem with the existing DM code. > > dm_swap_table() has the negative check to prevent such table loads, e.g.: > /* cannot change the device type, once a table is bound */ > > This check should come during table_load, as part of > dm_table_set_type(), rather than during table resume. I'll look to address this issue in v6 too. Regards, Mike