From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 4ED6D6B0044 for ; Fri, 18 Dec 2009 14:39:42 -0500 (EST) Date: Fri, 18 Dec 2009 20:39:11 +0100 From: Ingo Molnar Subject: Re: Swap on flash SSDs Message-ID: <20091218193911.GA6153@elte.hu> References: <4B2A8D83.30305@redhat.com> <20091218051210.GA417@elte.hu> <1261161677.27372.1629.camel@nimitz> <4B2BD55A.10404@sgi.com> <1261164487.27372.1735.camel@nimitz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1261164487.27372.1735.camel@nimitz> Sender: owner-linux-mm@kvack.org To: Dave Hansen Cc: Mike Travis , Christoph Lameter , Rik van Riel , Andrea Arcangeli , linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Mel Gorman , Andi Kleen , Benjamin Herrenschmidt , KAMEZAWA Hiroyuki , Chris Wright , Andrew Morton , "Stephen C. Tweedie" , Linus Torvalds List-ID: * Dave Hansen wrote: > On Fri, 2009-12-18 at 11:17 -0800, Mike Travis wrote: > > > Interesting discussion about SSD's. I was under the impression that with > > the finite number of write cycles to an SSD, that unnecessary writes were > > to be avoided? > > I'm no expert, but my impression was that this was a problem with other > devices and with "bare" flash, and mostly when writing to the same place > over and over. > > Modern, well-made flash SSDs and other flash devices have wear-leveling > built in so that they wear all of the flash cells evenly. There's still a > discrete number of writes that they can handle over their life, but it > should be high enough that you don't notice. > > http://en.wikipedia.org/wiki/Solid-state_drive A quality SDD is supposed to wear off in continuous non-stop write traffic after its Mean Time Between Failures. (Obviously it will take a few years for drives to gather that kind of true physical track record - right now what we have is the claims of manufacturers and 1-2 years of a track record.) And even when a cell does go bad and all the spares are gone, the failure mode is not catastrophic like with a hard disk, but that particular cell goes read-only and you can still recover the info and use the remaining cells. Sidenote: i think we should make the Linux swap code resilient against write IO errors of that fashion and reallocate the swap entry to a free slot. Right now in mm/page_io.c's end_swap_bio_write() we do this: /* * We failed to write the page out to swap-space. * Re-dirty the page in order to avoid it being reclaimed. * Also print a dire warning that things will go BAD (tm) * very quickly. * * Also clear PG_reclaim to avoid rotate_reclaimable_page() */ set_page_dirty(page); printk(KERN_ALERT "Write-error on swap-device (%u:%u:%Lu)\n", imajor(bio->bi_bdev->bd_inode), iminor(bio->bi_bdev->bd_inode), (unsigned long long)bio->bi_sector); ClearPageReclaim(page); We could be more intelligent than printing a scary error: we could clear that page from the swap map [permanently] and retry. It will still have a long-term failure mode when all swap pages are depleted - but that's still quite a slow failure mode and it is actionable via servicing. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org