From: Ingo Molnar <mingo@elte.hu>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Mike Travis <travis@sgi.com>,
Christoph Lameter <cl@linux-foundation.org>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>, Mel Gorman <mel@csn.ul.ie>,
Andi Kleen <andi@firstfloor.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Chris Wright <chrisw@sous-sol.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Stephen C. Tweedie" <sct@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Swap on flash SSDs
Date: Fri, 18 Dec 2009 20:39:11 +0100 [thread overview]
Message-ID: <20091218193911.GA6153@elte.hu> (raw)
In-Reply-To: <1261164487.27372.1735.camel@nimitz>
* Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> On Fri, 2009-12-18 at 11:17 -0800, Mike Travis wrote:
>
> > Interesting discussion about SSD's. I was under the impression that with
> > the finite number of write cycles to an SSD, that unnecessary writes were
> > to be avoided?
>
> I'm no expert, but my impression was that this was a problem with other
> devices and with "bare" flash, and mostly when writing to the same place
> over and over.
>
> Modern, well-made flash SSDs and other flash devices have wear-leveling
> built in so that they wear all of the flash cells evenly. There's still a
> discrete number of writes that they can handle over their life, but it
> should be high enough that you don't notice.
>
> http://en.wikipedia.org/wiki/Solid-state_drive
A quality SDD is supposed to wear off in continuous non-stop write traffic
after its Mean Time Between Failures. (Obviously it will take a few years for
drives to gather that kind of true physical track record - right now what we
have is the claims of manufacturers and 1-2 years of a track record.)
And even when a cell does go bad and all the spares are gone, the failure mode
is not catastrophic like with a hard disk, but that particular cell goes
read-only and you can still recover the info and use the remaining cells.
Sidenote: i think we should make the Linux swap code resilient against write
IO errors of that fashion and reallocate the swap entry to a free slot. Right
now in mm/page_io.c's end_swap_bio_write() we do this:
/*
* We failed to write the page out to swap-space.
* Re-dirty the page in order to avoid it being reclaimed.
* Also print a dire warning that things will go BAD (tm)
* very quickly.
*
* Also clear PG_reclaim to avoid rotate_reclaimable_page()
*/
set_page_dirty(page);
printk(KERN_ALERT "Write-error on swap-device (%u:%u:%Lu)\n",
imajor(bio->bi_bdev->bd_inode),
iminor(bio->bi_bdev->bd_inode),
(unsigned long long)bio->bi_sector);
ClearPageReclaim(page);
We could be more intelligent than printing a scary error: we could clear that
page from the swap map [permanently] and retry. It will still have a long-term
failure mode when all swap pages are depleted - but that's still quite a slow
failure mode and it is actionable via servicing.
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-12-18 19:39 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-17 19:00 [PATCH 00 of 28] Transparent Hugepage support #2 Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 01 of 28] compound_lock Andrea Arcangeli
2009-12-17 19:46 ` Christoph Lameter
2009-12-18 14:27 ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 02 of 28] alter compound get_page/put_page Andrea Arcangeli
2009-12-17 19:50 ` Christoph Lameter
2009-12-18 14:30 ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 03 of 28] clear compound mapping Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 04 of 28] add native_set_pmd_at Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 05 of 28] add pmd paravirt ops Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 06 of 28] no paravirt version of pmd ops Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 07 of 28] export maybe_mkwrite Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 08 of 28] comment reminder in destroy_compound_page Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 09 of 28] config_transparent_hugepage Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 10 of 28] add pmd mangling functions to x86 Andrea Arcangeli
2009-12-18 18:56 ` Mel Gorman
2009-12-19 15:27 ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 11 of 28] add pmd mangling generic functions Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 12 of 28] special pmd_trans_* functions Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 13 of 28] bail out gup_fast on freezed pmd Andrea Arcangeli
2009-12-18 18:59 ` Mel Gorman
2009-12-19 15:48 ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 14 of 28] pte alloc trans splitting Andrea Arcangeli
2009-12-18 19:03 ` Mel Gorman
2009-12-19 15:59 ` Andrea Arcangeli
2009-12-21 19:57 ` Mel Gorman
2009-12-17 19:00 ` [PATCH 15 of 28] add pmd mmu_notifier helpers Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 16 of 28] clear page compound Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 17 of 28] add pmd_huge_pte to mm_struct Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 18 of 28] ensure mapcount is taken on head pages Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 19 of 28] split_huge_page_mm/vma Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 20 of 28] split_huge_page paging Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 21 of 28] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 22 of 28] clear_huge_page fix Andrea Arcangeli
2009-12-18 19:16 ` Mel Gorman
2009-12-17 19:00 ` [PATCH 23 of 28] clear_copy_huge_page Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 24 of 28] kvm mmu transparent hugepage support Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 25 of 28] transparent hugepage core Andrea Arcangeli
2009-12-18 20:03 ` Mel Gorman
2009-12-19 16:41 ` Andrea Arcangeli
2009-12-21 20:31 ` Mel Gorman
2009-12-23 0:06 ` Andrea Arcangeli
2009-12-23 6:09 ` Paul Mundt
2009-12-23 6:09 ` Paul Mundt
2010-01-03 18:38 ` Mel Gorman
2010-01-04 15:49 ` Andrea Arcangeli
2010-01-04 16:58 ` Christoph Lameter
2010-01-04 6:16 ` Daisuke Nishimura
2010-01-04 16:04 ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 26 of 28] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 27 of 28] memcg compound Andrea Arcangeli
2009-12-18 1:27 ` KAMEZAWA Hiroyuki
2009-12-18 16:02 ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 28 of 28] memcg huge memory Andrea Arcangeli
2009-12-18 1:33 ` KAMEZAWA Hiroyuki
2009-12-18 16:04 ` Andrea Arcangeli
2009-12-18 23:06 ` KAMEZAWA Hiroyuki
2009-12-20 18:39 ` Andrea Arcangeli
2009-12-21 0:26 ` KAMEZAWA Hiroyuki
2009-12-21 1:24 ` Daisuke Nishimura
2009-12-21 3:52 ` KAMEZAWA Hiroyuki
2009-12-21 4:33 ` Daisuke Nishimura
2009-12-25 4:17 ` Daisuke Nishimura
2009-12-25 4:37 ` KAMEZAWA Hiroyuki
2009-12-24 10:00 ` Balbir Singh
2009-12-24 11:40 ` Andrea Arcangeli
2009-12-24 12:07 ` Balbir Singh
2009-12-17 19:54 ` [PATCH 00 of 28] Transparent Hugepage support #2 Christoph Lameter
2009-12-17 19:58 ` Rik van Riel
2009-12-17 20:09 ` Christoph Lameter
2009-12-18 5:12 ` Ingo Molnar
2009-12-18 6:18 ` KOSAKI Motohiro
2009-12-18 18:28 ` Christoph Lameter
2009-12-18 18:41 ` Dave Hansen
2009-12-18 19:17 ` Mike Travis
2009-12-18 19:28 ` Swap on flash SSDs Dave Hansen
2009-12-18 19:38 ` Andi Kleen
2009-12-18 19:39 ` Ingo Molnar [this message]
2009-12-18 20:13 ` Linus Torvalds
2009-12-18 20:31 ` Ingo Molnar
2009-12-19 18:38 ` Jörn Engel
2009-12-18 14:05 ` [PATCH 00 of 28] Transparent Hugepage support #2 Andrea Arcangeli
2009-12-18 18:33 ` Christoph Lameter
2009-12-19 15:09 ` Andrea Arcangeli
2009-12-17 20:47 ` Mike Travis
2009-12-18 3:28 ` Rik van Riel
2009-12-18 14:12 ` Andrea Arcangeli
2009-12-18 12:52 ` Avi Kivity
2009-12-18 18:47 ` Dave Hansen
2009-12-19 15:20 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091218193911.GA6153@elte.hu \
--to=mingo@elte.hu \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=avi@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=chrisw@sous-sol.org \
--cc=cl@linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=sct@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.