Re: swap on eMMC and other flash

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Arnd Bergmann <arnd@arndb.de>
Cc: linaro-kernel@lists.linaro.org, android-kernel@googlegroups.com,
	linux-mm@kvack.org, "Luca Porzio (lporzio)" <lporzio@micron.com>,
	Alex Lemberg <alex.lemberg@sandisk.com>,
	linux-kernel@vger.kernel.org,
	Saugata Das <saugata.das@linaro.org>,
	Venkatraman S <venkat@linaro.org>,
	Yejin Moon <yejin.moon@samsung.com>,
	Hyojin Jeong <syr.jeong@samsung.com>,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
Subject: Re: swap on eMMC and other flash
Date: Thu, 12 Apr 2012 11:36:17 +0900	[thread overview]
Message-ID: <4F863FA1.3090707@kernel.org> (raw)
In-Reply-To: <201204111557.14153.arnd@arndb.de>

On 04/12/2012 12:57 AM, Arnd Bergmann wrote:
> On Wednesday 11 April 2012, Minchan Kim wrote:
>> On Tue, Apr 10, 2012 at 08:32:51AM +0000, Arnd Bergmann wrote:
>>>>
>>>> I should have written more general term. I means write amplication but
>>>> WAF(Write Amplication Factor) is more popular. :(
>>>
>>> D'oh. Thanks for the clarification. Note that the entire idea of increasing the
>>> swap cluster size to the erase block size is to *reduce* write amplification:
>>>
>>> If we pick arbitrary swap clusters that are part of an erase block (or worse,
>>> span two partial erase blocks), sending a discard for one cluster does not
>>> allow the device to actually discard an entire erase block. Consider the best
>>> possible scenario where we have a 1MB cluster and 2MB erase blocks, all
>>> naturally aligned. After we have written the entire swap device once, all
>>> blocks are marked as used in the device, but some are available for reuse
>>> in the kernel. The swap code picks a cluster that is currently unused and
>>> sends a discard to the device, then fills the cluster with new pages.
>>> After that, we pick another swap cluster elsewhere. The erase block now
>>> contains 50% new and 50% old data and has to be garbage collected, so the
>>> device writes 2MB of data  to anther erase block. So, in order to write 1MB,
>>> the device has written 3MB and the write amplification factor is 3. Using
>>> 8MB erase blocks, it would be 9.
>>>
>>> If we do the active compaction and increase the cluster size to the erase
>>> block size, there is no write amplification inside of the device (and no
>>> stalls from the garbage collection, which are the other concern), and
>>> we only need to write a few blocks again that are still valid in a cluster
>>> at the time we want to reuse it. On an ideal device, the write amplification
>>> for active compaction should be exactly the same as what we get when we
>>> write a cluster while some of the data in it is still valid and we skip
>>> those pages, while some devices might now like having to gc themselves.
>>> Doing the compaction in software means we have to spend CPU cycles on it,
>>> but we get to choose when it happens and don't have to block on the device
>>> during GC.
>>
>> Thanks for detail explanation.
>> At least, we need active compaction to avoid GC completely when we can't find
>> empty cluster and there are lots of hole.
>> Indirection layer we discussed last LSF/MM could help slot change by
>> compaction easily.
>> I think way to find empty cluster should be changed because current linear scan
>> is not proper for bigger cluster size.
>>
>> I am looking forward to your works!
>>
>> P.S) I'm afraid this work might raise endless war, again which host can do well VS
>> device can do well. If we can work out, we don't need costly eMMC FTL, just need
>> dumb bare nand, controller and simple firmware.
>
> IMHO, we should only distinguish between dumb and smart devices, defined as follows:
>
> 1. smart devices behave like all but the extremely cheap SSDs. They are optimized
> for 4KB random I/O, and the erase block size is not visible because there is
> a write cache and a flexible controller between the block device abstraction
> and the raw flash.
>
> 2. dumb devices have very visible effects that stem from a simplistic remapping
> layer that translates logical erase block numbers into physical erase blocks,
> and only a fixed number of those can be written at the same time before forcing
> GC. Writes smaller than page size are strongly discouraged here. There is no
> RAM to cache writes in the controller, but we still expect these devices to
> have a reasonable wear levelling policy.  This covers almost all of today's
> eMMC, SD, USB and CF as well as some cheap ATA SSD.

Such dumb devices have disadvantage as follows,
Some user expect it manage to do itself and some user don't expect it so 
someone like you will add smart features on host to remove GC but 
someone still believes that eMMC by itself will do enough so that he can 
use any FSes on it.

Conflict happens.

Although we can solve several problems to use eMMC as swap, other 
partition could be used for any FSes which are not aware of eMMC 
characteristic. It could cause GC in eMMC internal although it work out 
eMMC as swap so long latency when we use it as swap could be happened.

>
> A third category is of course spinning rust, but I think with the distinction
> for solid state media above, we have a pretty good grip on all existing
> media. As eMMC and UFS evolve over time, we might want to stick them into the
> first category, but I don't think we need more categories.
>
> 	Arnd
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2012-04-12  2:36 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-30 17:44 swap on eMMC and other flash Arnd Bergmann
2012-03-30 18:50 ` Arnd Bergmann
2012-03-30 22:08   ` Zach Pfeffer
2012-03-31  9:24     ` Arnd Bergmann
2012-04-03 18:17       ` Zach Pfeffer
2012-03-31 20:29   ` Hugh Dickins
2012-04-02 11:45     ` Arnd Bergmann
2012-04-02 14:41       ` Hugh Dickins
2012-04-02 14:55         ` Arnd Bergmann
2012-04-05  0:17           ` 정효진
2012-04-09 12:50             ` Arnd Bergmann
2012-04-08 13:50           ` Alex Lemberg
2012-04-09  2:14             ` Minchan Kim
2012-04-09  7:37               ` 정효진
2012-04-09  8:11                 ` Minchan Kim
2012-04-09 13:00                   ` Arnd Bergmann
2012-04-10  1:10                     ` Minchan Kim
2012-04-10  8:40                       ` Arnd Bergmann
2012-04-12  8:32                         ` Luca Porzio (lporzio)
2012-04-09 12:54                 ` Arnd Bergmann
2012-04-02 12:52     ` Luca Porzio (lporzio)
2012-04-02 14:58       ` Hugh Dickins
2012-04-02 16:51         ` Rik van Riel
2012-04-04 12:21   ` Adrian Hunter
2012-04-04 12:47     ` Arnd Bergmann
2012-04-11 10:28       ` Adrian Hunter
2012-07-16 13:29         ` Pavel Machek
     [not found] ` <CAEwNFnA2GeOayw2sJ_KXv4qOdC50_Nt2KoK796YmQF+YV1GiEA@mail.gmail.com>
2012-04-06 16:16   ` Arnd Bergmann
2012-04-09  2:06     ` Minchan Kim
2012-04-09 12:35       ` Arnd Bergmann
2012-04-10  0:57         ` Minchan Kim
2012-04-10  8:32           ` Arnd Bergmann
2012-04-11  9:54             ` Minchan Kim
2012-04-11 15:57               ` Arnd Bergmann
2012-04-12  2:36                 ` Minchan Kim [this message]
2012-04-16 18:22                 ` Stephan Uphoff
2012-04-16 18:59                   ` Arnd Bergmann
2012-04-16 21:12                     ` Stephan Uphoff
2012-04-17  2:18                       ` Minchan Kim
2012-04-17  2:05                     ` Minchan Kim
2012-04-27  7:34                   ` Luca Porzio (lporzio)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F863FA1.3090707@kernel.org \
    --to=minchan@kernel.org \
    --cc=alex.lemberg@sandisk.com \
    --cc=android-kernel@googlegroups.com \
    --cc=arnd@arndb.de \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=lporzio@micron.com \
    --cc=saugata.das@linaro.org \
    --cc=syr.jeong@samsung.com \
    --cc=venkat@linaro.org \
    --cc=yejin.moon@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).