From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ric Wheeler <rwheeler@redhat.com>
Subject: Re: thin provisioned LUN support
Date: Fri, 07 Nov 2008 11:22:59 -0500
Message-ID: <49146B63.70208@redhat.com>
References: <4913028B.6010405@redhat.com>	 <1225984628.4703.80.camel@localhost.localdomain>	 <Pine.LNX.4.64.0811061519350.3431@Nokia-N800-26>	 <20081107120534.GO21867@kernel.dk>	 <1226072970.15281.46.camel@think.oraclecorp.com>	 <yq1r65na5ll.fsf@sermon.lab.mkp.net>	 <1226074002.8030.33.camel@localhost.localdomain>	 <1226074270.15281.50.camel@think.oraclecorp.com> <1226074710.8030.43.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Chris Mason <chris.mason@oracle.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	David Woodhouse <dwmw2@infradead.org>,
	linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Black_David@emc.com, Tom Coughlan <coughlan@redhat.com>,
	Matthew Wilcox <matthew@wil.cx>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:60394 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752450AbYKGQXL (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 7 Nov 2008 11:23:11 -0500
In-Reply-To: <1226074710.8030.43.camel@localhost.localdomain>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

James Bottomley wrote:
> On Fri, 2008-11-07 at 11:11 -0500, Chris Mason wrote:
>   
>> On Fri, 2008-11-07 at 10:06 -0600, James Bottomley wrote:
>>     
>>> On Fri, 2008-11-07 at 11:00 -0500, Martin K. Petersen wrote:
>>>       
>>>>>>>>> "Chris" == Chris Mason <chris.mason@oracle.com> writes:
>>>>>>>>>                   
>>>> Chris> Hmmm, it's surprising to me that arrays who tell us please use
>>>> Chris> the noop elevator suddenly want us to merge discard requests.
>>>> Chris> The array really needs to be able to deal with this internally.
>>>>
>>>> Let's also not forget that we're talking about merging discard
>>>> requests for the purpose making internal array housekeeping efficient.
>>>> That involves merging discards up to the internal array block sizes
>>>> which may be on the order of 512/768/1024 KB.
>>>>
>>>> If we were talking about merging discards up to a 4/8/16 KB boundary
>>>> that might be something we'd have a chance to do within a reasonable
>>>> amount of time (bigger than normal read/write I/O but not hours).
>>>>
>>>> But keeping discard state around for long enough to attempt to
>>>> aggregate 768KB (and 768KB-aligned) chunks is icky.
>>>>         
>>> Icky but possible.  It's the same rb tree affair we use to keep vma
>>> lists (with the same characteristics).  The point is that technically we
>>> can do this pretty easily ... all the way down to not losing any
>>> potential discards that the array would ignore.  However, procedurally
>>> it would certainly be sending the wrong message to the array vendors
>>> (the message being "sure the OS will sanitise any crap you care to
>>> dump").
>>>
>>> On the other hand, if we have to do it for flash and MMC anyway ...
>>>       
>> It doesn't seem like a good idea to maintain a ton of code that gets
>> exercised so rarely, especially wrt filesystem crashes.
>>     
>
> Heh, am I the only person here who deletes files on a regular basis
> (principally to get my disk down from 99%)?
>
>   
>> Just testing it would be a fairly large challenge, spread out across N
>> filesystems.  I think we need to keep discard as simple as we possibly
>> can.
>>     
>
> I don't disagree with that ... I'm not saying we *should* merely that we
> *could*.
>
> James
>
>   
I agree that simple and robust are key, but we will need to try and do 
reasonable coalescing of the requests.

Depending on how vendors implement those unmap commands, sending down a 
sequence of commands might cause a performance issue if done at too fine 
a granularity. Easiest way to handle that is to make sure that we have a 
way of disabling the unmap/discard support (mount option?).

Ric