From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [LSF/MM TOPIC] atomic block device Date: Mon, 17 Feb 2014 08:05:07 -0500 Message-ID: <53020903.1050006@fb.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel , , , Jens Axboe , Bryan E Veal , Annie Foong To: Dan Williams , Return-path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:59634 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752059AbaBQNCt (ORCPT ); Mon, 17 Feb 2014 08:02:49 -0500 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 02/15/2014 10:04 AM, Dan Williams wrote: > In response to Dave's call [1] and highlighting Jeff's attend request > [2] I'd like to stoke a discussion on an emulation layer for atomic > block commands. Specifically, SNIA has laid out their position on the > command set an atomic block device may support (NVM Programming Model > [3]) and it is a good conversation piece for this effort. The goal > would be to review the proposed operations, identify the capabilities > that would be readily useful to filesystems / existing use cases, and > tear down a straw man implementation proposal. > > The SNIA defined capabilities that seem the highest priority to implement are: > * ATOMIC_MULTIWRITE - dis-contiguous LBA ranges, power fail atomic, no > ordering constraint relative to other i/o > > * ATOMIC_WRITE - contiguous LBA range, power fail atomic, no ordering > constraint relative to other i/o > > * EXISTS - not an atomic command, but defined in the NPM. It is akin > to SEEK_{DATA|HOLE} to test whether an LBA is mapped or unmapped. If > the LBA is mapped additionally specifies whether data is present or > the LBA is only allocated. > > * SCAR - again not an atomic command, but once we have metadata can > implement a bad block list, analogous to the bad-block-list support in > md. > > Initial thought is that this functionality is better implemented as a > library a block device driver (bio-based or request-based) can call to > emulate these features. In the case where the feature is directly > supported by the underlying hardware device the emulation layer will > stub out and pass it through. The argument for not doing this as a > device-mapper target or stacked block device driver is to ease > provisioning and make the emulation transparent. On the other hand, > the argument for doing this as a virtual block device is that the > "failed to parse device metadata" is a known failure scenario for > dm/md, but not sd for example. Hi Dan, I'd suggest a dm device instead of a special library, mostly because the emulated device is likely to need some kind of cleanup action after a crash, and the dm model is best suited to cleanly provide that. It's also a good fit for people that want to duct tape a small amount of very fast nvm onto relatively slower devices. The absolute minimum to provide something useful is a 16K discontig atomic. That won't help the filesystems much, but it will allow mysql to turn off double buffering. Oracle would benefit from ~64K, mostly from a safety point of view since they don't double buffer. Helping the filesystems is harder, we need atomics bigger than any individual device is likely to provide. But as Dave says elsewhere in the thread, we can limit that for specific workloads. I'm not sold on SCAR, since I'd expect the FTL or drive firmware provide that for us, what use case do you have in mind there? -chris