From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [PATCH 1/2] block: Add support for atomic writes Date: Thu, 7 Nov 2013 10:55:54 -0500 Message-ID: <20131107155554.3802.2587@localhost.localdomain> References: <20131101212704.10239.73920@localhost.localdomain> <20131101212854.10239.19830@localhost.localdomain> <20131107135220.3802.91392@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8BIT Cc: Matthew Wilcox , Linux FS Devel , Jens Axboe To: Jeff Moyer Return-path: Received: from dkim2.fusionio.com ([66.114.96.54]:56209 "EHLO dkim2.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754404Ab3KGPz7 convert rfc822-to-8bit (ORCPT ); Thu, 7 Nov 2013 10:55:59 -0500 Received: from mx2.fusionio.com (unknown [10.101.1.160]) by dkim2.fusionio.com (Postfix) with ESMTP id C42289A06B0 for ; Thu, 7 Nov 2013 08:55:58 -0700 (MST) In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Quoting Jeff Moyer (2013-11-07 10:43:41) > Chris Mason writes: > > > Unfortunately, it's hard to say. I think the fusionio cards are the > > only shipping devices that support this, but I've definitely heard that > > others plan to support it as well. mariadb/percona already support the > > atomics via fusionio specific ioctls, and turning that into a real > > O_ATOMIC is a priority so other hardware can just hop on the train. > > > > This feature in general is pretty natural for the log structured squirrels > > they stuff inside flash, so I'd expect everyone to support it. Matthew, > > how do you feel about all of this? > > > > With the fusionio drivers, we've recently increased the max atomic size. > > It's basically 1MB, disjoint or contig doesn't matter. We're powercut > > safe at 1MB. > > > >> > >> Basically, I'd like to avoid requiring a trial and error programming > >> model to determine what an application can expect to work (like we have > >> with O_DIRECT right now). > > > > I'm really interested in ideas on how to provide that. But, with dm, > > md, and a healthy assortment of flash vendors, I don't know how... > > Well, we have control over dm and md, so I'm not worried about that. > For the storage vendors, we'll have to see about influencing the > standards bodies. > > The way I see it, there are 3 pieces of information that are required: > 1) minimum size that is atomic (likely the physical block size, but > maybe the logical block size?) > 2) maximum size that is atomic (multiple of minimum size) > 3) whether or not discontiguous ranges are supported > > Did I miss anything? It'll vary from vendor to vendor. A discontig range of two 512KB areas is different from 256 distcontig 4KB areas. And it's completely dependent on filesystem fragmentation. So, a given IO might pass for one file and fail for the next. In a DM/MD configuration, an atomic IO inside a single stripe on raid0 could succeed while it will fail if it spans two stripes to two different devices. > > > I've attached my current test program. The basic idea is to fill > > buffers (1MB in size) with a random pattern. Each buffer has a > > different random pattern. > > > > You let it run for a while and then pull the plug. After the box comes > > back up, run the program again and it looks for consistent patterns > > filling each 1MB aligned region in the file. > [snip] > > In order to reliably find torn blocks without O_ATOMIC, I had to bump > > the write size to 1MB and run 24 instances in parallel. > > Thanks for the program (I actually have my own setup for verifying torn > writes, the veritable dainto[1], which nobody uses). Just to be certain, > you did bump /sys/block//queue/max_sectors_kb to 1MB, right? Since the atomics patch does things as a list of bios, there's no max_sectors_kb to worry about. Each individual bio was only 4K. -chris