From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: thin provisioned LUN support Date: Fri, 07 Nov 2008 11:22:59 -0500 Message-ID: <49146B63.70208@redhat.com> References: <4913028B.6010405@redhat.com> <1225984628.4703.80.camel@localhost.localdomain> <20081107120534.GO21867@kernel.dk> <1226072970.15281.46.camel@think.oraclecorp.com> <1226074002.8030.33.camel@localhost.localdomain> <1226074270.15281.50.camel@think.oraclecorp.com> <1226074710.8030.43.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Chris Mason , "Martin K. Petersen" , Jens Axboe , David Woodhouse , linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org, Black_David@emc.com, Tom Coughlan , Matthew Wilcox To: James Bottomley Return-path: Received: from mx2.redhat.com ([66.187.237.31]:60394 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752450AbYKGQXL (ORCPT ); Fri, 7 Nov 2008 11:23:11 -0500 In-Reply-To: <1226074710.8030.43.camel@localhost.localdomain> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: James Bottomley wrote: > On Fri, 2008-11-07 at 11:11 -0500, Chris Mason wrote: > >> On Fri, 2008-11-07 at 10:06 -0600, James Bottomley wrote: >> >>> On Fri, 2008-11-07 at 11:00 -0500, Martin K. Petersen wrote: >>> >>>>>>>>> "Chris" == Chris Mason writes: >>>>>>>>> >>>> Chris> Hmmm, it's surprising to me that arrays who tell us please use >>>> Chris> the noop elevator suddenly want us to merge discard requests. >>>> Chris> The array really needs to be able to deal with this internally. >>>> >>>> Let's also not forget that we're talking about merging discard >>>> requests for the purpose making internal array housekeeping efficient. >>>> That involves merging discards up to the internal array block sizes >>>> which may be on the order of 512/768/1024 KB. >>>> >>>> If we were talking about merging discards up to a 4/8/16 KB boundary >>>> that might be something we'd have a chance to do within a reasonable >>>> amount of time (bigger than normal read/write I/O but not hours). >>>> >>>> But keeping discard state around for long enough to attempt to >>>> aggregate 768KB (and 768KB-aligned) chunks is icky. >>>> >>> Icky but possible. It's the same rb tree affair we use to keep vma >>> lists (with the same characteristics). The point is that technically we >>> can do this pretty easily ... all the way down to not losing any >>> potential discards that the array would ignore. However, procedurally >>> it would certainly be sending the wrong message to the array vendors >>> (the message being "sure the OS will sanitise any crap you care to >>> dump"). >>> >>> On the other hand, if we have to do it for flash and MMC anyway ... >>> >> It doesn't seem like a good idea to maintain a ton of code that gets >> exercised so rarely, especially wrt filesystem crashes. >> > > Heh, am I the only person here who deletes files on a regular basis > (principally to get my disk down from 99%)? > > >> Just testing it would be a fairly large challenge, spread out across N >> filesystems. I think we need to keep discard as simple as we possibly >> can. >> > > I don't disagree with that ... I'm not saying we *should* merely that we > *could*. > > James > > I agree that simple and robust are key, but we will need to try and do reasonable coalescing of the requests. Depending on how vendors implement those unmap commands, sending down a sequence of commands might cause a performance issue if done at too fine a granularity. Easiest way to handle that is to make sure that we have a way of disabling the unmap/discard support (mount option?). Ric