linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* UBI/UBIFS: dealing with MLC's paired pages
@ 2015-09-17 13:22 Boris Brezillon
  2015-09-17 15:20 ` Artem Bityutskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-09-17 13:22 UTC (permalink / raw)
  To: Artem Bityutskiy, Richard Weinberger
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Qi Wang 王起 (qiwang), Iwo Mergler,
	Jeff Lauruhn (jlauruhn)

Hello,

I'm currently working on the paired pages problem we have on MLC chips.
I remember discussing it with Artem earlier this year when I was
preparing my talk for ELC.

I now have some time I can spend working on this problem and I started
looking at how this can be solved.

First let's take a look at the UBI layer.
There's one basic thing we have to care about: protecting UBI metadata.
There are two kind of metadata:
1/ those stored at the beginning of each erase block (EC and VID
   headers)
2/ those stored in specific volumes (layout and fastmap volumes)

We don't have to worry about #2 since those are written using atomic
update, and atomic updates are immune to this paired page corruption
problem (either the whole write is valid, or none of it is valid).

This leaves problem #1.
For this case, Artem suggested to duplicate the EC header in the VID
header so that if page 0 is corrupted we can recover the EC info from
page 1 (which will contain both VID and EC info).
Doing that is fine for dealing with EC header corruption, since, AFAIK,
none of the NAND vendors are pairing page 0 with page 1.
Still remains the VID header corruption problem. Do prevent that we
still have several solutions:
a/ skip the page paired with the VID header. This is doable and can be
   hidden from UBI users, but it also means that we're loosing another
   page for metadata (not a negligible overhead)
b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
   seems the right place to put that in, since fastmap is already
   storing those information for almost all blocks. Still we would have
   to modify fastmap a bit to store information about all erase blocks
   and not only those that are not part of the fastmap pool.
   Also, updating that in real-time would require using a log approach,
   instead of the atomic update currently used by fastmap when it runs
   out of PEBs in it's free PEB pool. Note that the log approach does
   not have to be applied to all fastmap data (we just need it for the
   PEB <-> LEB info).
   Another off-topic note regarding the suggested log approach: we
   could also use it to log which PEB was last written/erased, and use
   that to handle the unstable bits issue.
c/ (also suggested by Artem) delay VID write until we have enough data
   to write on the LEB, and thus guarantee that it cannot be corrupted
   (at least by programming on the paired page ;-)) anymore.
   Doing that would also require logging data to be written on those
   LEBs somewhere, not to mention the impact of copying the data twice
   (once in the log, and then when we have enough data, in the real
   block).

I don't have any strong opinion about which solution is the best, also
I'm maybe missing other aspects or better solutions, so feel free to
comment on that and share your thoughts.

That's all for the UBI layer. We will likely need new functions (and
new fields in existing structures) to help UBI users deal with MLC
NANDs: for example a field exposing the storage type or a function
helping users skip one (or several) blocks to secure the data they have
written so far. Anyway, those are things we can discuss after deciding
which approach we want to take.

Now, let's talk about the UBIFS layer. We are facing pretty much the
same problem in there: we need to protect the data we have already
written from time to time.
AFAIU (correct me if I'm wrong), data should be secure when we sync the
file system, or commit the UBIFS journal (feel free to correct me if
I'm not using the right terms in my explanation).
As explained earlier, the only way to secure data is to skip some pages
(those that are paired with the already written ones).

I see two approaches here (there might be more):
1/ do not skip any pages until we are asked to secure the data, and
   then skip as much pages as needed to ensure nobody can ever corrupt
   the data. With this approach you can loose a non negligible amount
   of space. For example, with this paired pages scheme [1], if you
   only write page on page 2 and want to secure your data, you'll have
   to skip pages 3 to 8.
2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a
   block). With this solution you always loose half the NAND capacity,
   but in case of small writes, it's still more efficient than #1.
   Of course using that solution is not acceptable, because you'll
   only be able to use half the NAND capacity, but the plan is to use
   it in conjunction with the GC, so that from time to time UBIFS
   data chunks/nodes can be put in a single erase block without
   skipping half the pages.
   Note that currently the GC does not work this way: it tries to
   collect chunks one by one and write them to the journal to free a
   dirty LEB. What we would need here is a way to collect enough data
   to fill an entire block and after that release the LEBs that where
   previously using half the LEB capacity.

Of course both of those solutions implies marking the skipped regions
as dirty so that the GC can account for the padded space. For #1 we
should probably also use padding nodes to reflect how much space is lost
on the media, though I'm not sure how this can be done. For #2, we may
have to differentiate 'full' and 'half' LEBs in the LPT.

Anyway, all the above are just some ideas I had or suggestions I got
from other people and I wanted to share. I'm open to any new
suggestions, because none of the proposed solutions are easy to
implement.

Best Regards,

Boris

P.S.: Note that I'm not discussing the WP solution on purpose: I'd like
      to have a solution that is completely HW independent.

[1]https://www.olimex.com/Products/Components/IC/H27UBG8T2BTR/resources/H27UBG8T2BTR.pdf,
   chapter 6.1. Paired Page Address Information

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon
@ 2015-09-17 15:20 ` Artem Bityutskiy
  2015-09-17 15:46   ` Boris Brezillon
  2015-09-29 11:19 ` Richard Weinberger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Artem Bityutskiy @ 2015-09-17 15:20 UTC (permalink / raw)
  To: Boris Brezillon, Richard Weinberger
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Qi Wang 王起 (qiwang), Iwo Mergler,
	Jeff Lauruhn (jlauruhn)

On Thu, 2015-09-17 at 15:22 +0200, Boris Brezillon wrote:
> Hello,
> 
> I'm currently working on the paired pages problem we have on MLC
> chips.
> I remember discussing it with Artem earlier this year when I was
> preparing my talk for ELC.

Hi Boris,

excellent summary, very structured. I won't generate any new idea now,
just an implementation tactics suggestion.

For an implementation, I'd started with a power cut emulator which
emulates paired pages. I'd probably do it in UBI, may be lower. I'd
also write a good UBI power-cut test application. And then I'd start
playing with various implementation approaches. I'd use the test-driven
approach.

Artem.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-17 15:20 ` Artem Bityutskiy
@ 2015-09-17 15:46   ` Boris Brezillon
  2015-09-17 16:47     ` Richard Weinberger
  0 siblings, 1 reply; 43+ messages in thread
From: Boris Brezillon @ 2015-09-17 15:46 UTC (permalink / raw)
  To: dedekind1
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Qi Wang 王起 "(qiwang)",
	Iwo Mergler, Jeff Lauruhn (jlauruhn)

Hi Artem,

On Thu, 17 Sep 2015 18:20:39 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Thu, 2015-09-17 at 15:22 +0200, Boris Brezillon wrote:
> > Hello,
> > 
> > I'm currently working on the paired pages problem we have on MLC
> > chips.
> > I remember discussing it with Artem earlier this year when I was
> > preparing my talk for ELC.
> 
> Hi Boris,
> 
> excellent summary, very structured. I won't generate any new idea now,
> just an implementation tactics suggestion.

I'm taking that too :-).

> 
> For an implementation, I'd started with a power cut emulator which
> emulates paired pages. I'd probably do it in UBI, may be lower.

I actually these kind of emulation in nandsim, though I currently
generate a kernel Oops when the emulated power-cut occurs, which is not
really easy to use (still both paired pages are corrupted on the file
used by nandsim to store NAND data).
I'm considering changing the behavior to return -EROFS instead of
BUG(), but I'm still not sure upper layers are expecting this error...

I know that using an emulation layer is the only way to go if we want
to test the implementation, but I managed to generate those paired
pages problem manually by launching a reset in the middle of a page
program operation. Even this 'page program interruption' code is still
hacky, I think we'll be able to 'easily' validate the solution in real
world use cases when it's ready.

> I'd
> also write a good UBI power-cut test application.

Not sure what you mean by a UBI power-cut application?

> And then I'd start
> playing with various implementation approaches.

Yep, that was the plan, I was hoping you could help me exclude some of
them, but I guess testing all of them is the only way to find the
best one :-/.

> I'd use the test-driven
> approach.

Hm, yep I guess that's the only way to test as much cases as possible,
but even with that I doubt I'll be able to think of all the cases that
could happen in real world.

Thanks for the feedback.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-17 15:46   ` Boris Brezillon
@ 2015-09-17 16:47     ` Richard Weinberger
  2015-09-18  7:17       ` Andrea Scian
  0 siblings, 1 reply; 43+ messages in thread
From: Richard Weinberger @ 2015-09-17 16:47 UTC (permalink / raw)
  To: Boris Brezillon, dedekind1
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Qi Wang 王起 "(qiwang)", Iwo Mergler,
	Jeff Lauruhn (jlauruhn)

Boris,

Am 17.09.2015 um 17:46 schrieb Boris Brezillon:
>> I'd
>> also write a good UBI power-cut test application.
> 
> Not sure what you mean by a UBI power-cut application?

UBI has a mechanism so emulate a power-cut. Userspace
can trigger it. I assume Artem meant that we could extend the mechanism
to emulate paired page related issues in UBI.

>> And then I'd start
>> playing with various implementation approaches.
> 
> Yep, that was the plan, I was hoping you could help me exclude some of
> them, but I guess testing all of them is the only way to find the
> best one :-/.
> 
>> I'd use the test-driven
>> approach.
> 
> Hm, yep I guess that's the only way to test as much cases as possible,
> but even with that I doubt I'll be able to think of all the cases that
> could happen in real world.

Yeah, the crucial point is that we have to emulate paired pages very good.
Testing using emulation is nice but we need bare metal tests too.
I have one board with MLC NAND, I'll happily wear it do death. B-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-17 16:47     ` Richard Weinberger
@ 2015-09-18  7:17       ` Andrea Scian
  2015-09-18  7:41         ` Boris Brezillon
  2015-09-18  7:54         ` Artem Bityutskiy
  0 siblings, 2 replies; 43+ messages in thread
From: Andrea Scian @ 2015-09-18  7:17 UTC (permalink / raw)
  To: Richard Weinberger, Boris Brezillon, dedekind1
  Cc: linux-mtd, David Woodhouse, Brian Norris,
	Qi Wang 王起 "(qiwang)", Iwo Mergler,
	Jeff Lauruhn (jlauruhn)


Dear all,

Il 17/09/2015 18:47, Richard Weinberger ha scritto:
> Boris,
>
> Am 17.09.2015 um 17:46 schrieb Boris Brezillon:
>>> I'd
>>> also write a good UBI power-cut test application.
>> Not sure what you mean by a UBI power-cut application?
> UBI has a mechanism so emulate a power-cut. Userspace
> can trigger it. I assume Artem meant that we could extend the mechanism
> to emulate paired page related issues in UBI.
>
>>> And then I'd start
>>> playing with various implementation approaches.
>> Yep, that was the plan, I was hoping you could help me exclude some of
>> them, but I guess testing all of them is the only way to find the
>> best one :-/.
>>
>>> I'd use the test-driven
>>> approach.
>> Hm, yep I guess that's the only way to test as much cases as possible,
>> but even with that I doubt I'll be able to think of all the cases that
>> could happen in real world.
> Yeah, the crucial point is that we have to emulate paired pages very good.
> Testing using emulation is nice but we need bare metal tests too.
> I have one board with MLC NAND, I'll happily wear it do death. B-)

I think Boris has the same board somewhere ;-)

I perfectly understand the reason why using nandsim (and powercut 
simulator in general) but, AFAIK, the powercut problem is hard to 
"simulate" because the main issue is when the device see a loss of power 
in the middle of an operation (page write or block erase)

I think that the best approach for bare metal test is something like the 
following:
- connect a real powercut device (a simple relais that cut the main 
power supply driven by a GPIO)
- drive this device inside the MTD code (probably with random delay 
after issuing a NAND command)

I think that I (as DAVE) can provide this kind of hardware, with an easy 
plug-in connector on our hostboard (if those are the one that Richard 
speak about).
Please let me know if you're interesting in it, if so I'll forward this 
request to our hardware guys and give you an official confirm.

While running this kind  of test, I would also increase CPU load, to 
reduce bypass capacitor intrusion (which may lead to wrong result in a 
generic case)

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-18  7:17       ` Andrea Scian
@ 2015-09-18  7:41         ` Boris Brezillon
  2015-09-18  7:54         ` Artem Bityutskiy
  1 sibling, 0 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-09-18  7:41 UTC (permalink / raw)
  To: Andrea Scian
  Cc: Richard Weinberger, dedekind1, linux-mtd, David Woodhouse,
	Brian Norris, Qi Wang 王起 "(qiwang)",
	Iwo Mergler, Jeff Lauruhn (jlauruhn)

Hi Andrea,

On Fri, 18 Sep 2015 09:17:02 +0200
Andrea Scian <rnd4@dave-tech.it> wrote:

> 
> Dear all,
> 
> Il 17/09/2015 18:47, Richard Weinberger ha scritto:
> > Boris,
> >
> > Am 17.09.2015 um 17:46 schrieb Boris Brezillon:
> >>> I'd
> >>> also write a good UBI power-cut test application.
> >> Not sure what you mean by a UBI power-cut application?
> > UBI has a mechanism so emulate a power-cut. Userspace
> > can trigger it. I assume Artem meant that we could extend the mechanism
> > to emulate paired page related issues in UBI.
> >
> >>> And then I'd start
> >>> playing with various implementation approaches.
> >> Yep, that was the plan, I was hoping you could help me exclude some of
> >> them, but I guess testing all of them is the only way to find the
> >> best one :-/.
> >>
> >>> I'd use the test-driven
> >>> approach.
> >> Hm, yep I guess that's the only way to test as much cases as possible,
> >> but even with that I doubt I'll be able to think of all the cases that
> >> could happen in real world.
> > Yeah, the crucial point is that we have to emulate paired pages very good.
> > Testing using emulation is nice but we need bare metal tests too.
> > I have one board with MLC NAND, I'll happily wear it do death. B-)
> 
> I think Boris has the same board somewhere ;-)

Yep :-).

> 
> I perfectly understand the reason why using nandsim (and powercut 
> simulator in general) but, AFAIK, the powercut problem is hard to 
> "simulate" because the main issue is when the device see a loss of power 
> in the middle of an operation (page write or block erase)

Well, it can be easily simulated in nandsim. Here is a dirty hack [1]
doing that. Of course my implementation is far from perfect, and a
lot of things are hardcoded (like the paired pages scheme), but I'm
pretty sure it is able to emulate the behavior of a power cut when a
specific page in block is accessed.
The other reason we want to simulate it is because we need to test what
happens if a corruption happens at specific places: corruption of UBI
EC, VID and payload data. This means that we need to be able to
simulate a powercut when a specific page (relatively to a block) is
accessed.

> 
> I think that the best approach for bare metal test is something like the 
> following:
> - connect a real powercut device (a simple relais that cut the main 
> power supply driven by a GPIO)
> - drive this device inside the MTD code (probably with random delay 
> after issuing a NAND command)

Hm, it's seems like a complicated infrastructure. All you need to
trigger corruptions in paired pages is to interrupt the program
operation in the middle, and this can be done by simply sending a reset
command while it's taking place (I tested that method, and if I reset
the chip after tPROG / 2 it always corrupts both paired pages).

> 
> I think that I (as DAVE) can provide this kind of hardware, with an easy 
> plug-in connector on our hostboard (if those are the one that Richard 
> speak about).
> Please let me know if you're interesting in it, if so I'll forward this 
> request to our hardware guys and give you an official confirm.
> 
> While running this kind  of test, I would also increase CPU load, to 
> reduce bypass capacitor intrusion (which may lead to wrong result in a 
> generic case)

Of course, real world tests are welcome, but I don't think we can rely
on them while developing the solution.
Anyway, thanks for the proposition.

Best Regards,

Boris


[1]http://code.bulix.org/73xjfn-88945

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-18  7:17       ` Andrea Scian
  2015-09-18  7:41         ` Boris Brezillon
@ 2015-09-18  7:54         ` Artem Bityutskiy
  2015-09-18  7:57           ` Bityutskiy, Artem
  2015-09-18  9:38           ` Andrea Scian
  1 sibling, 2 replies; 43+ messages in thread
From: Artem Bityutskiy @ 2015-09-18  7:54 UTC (permalink / raw)
  To: Andrea Scian, Richard Weinberger, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris,
	Qi Wang 王起 "(qiwang)", Iwo Mergler,
	Jeff Lauruhn (jlauruhn)

Hi Andrea,

On Fri, 2015-09-18 at 09:17 +0200, Andrea Scian wrote:
> I perfectly understand the reason why using nandsim (and powercut 
> simulator in general) but, AFAIK, the powercut problem is hard to 
> "simulate" because the main issue is when the device see a loss of
> power 
> in the middle of an operation (page write or block erase)

This is right, and no doubts real power cuts testing is the most
important thing.

However, at the beginning, it is very hard to develop if you do not
have a quick way to verify your ideas. Simulation is exactly for this -
to make the first reliable draft. Once that work, you go to the second
stage - real HW testing.

Real HW testing requires a real power cycle, no guarantees power cut
happens at the right moment, so you may spend hours emulating just one
paired-page case. Compare this to just running a script, and it
emulates you 100 paired-page cases during 10 minutes. And you can
emulate it easily at the interesting places, not just during the main
data writes.

So, to recap, I suggest emulation to make the first draft, and then
start heavy real testing to shape the final solution.

Artem.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-18  7:54         ` Artem Bityutskiy
@ 2015-09-18  7:57           ` Bityutskiy, Artem
  2015-09-18  9:38           ` Andrea Scian
  1 sibling, 0 replies; 43+ messages in thread
From: Bityutskiy, Artem @ 2015-09-18  7:57 UTC (permalink / raw)
  To: boris.brezillon@free-electrons.com, rnd4@dave-tech.it,
	richard@nod.at
  Cc: computersforpeace@gmail.com, qiwang@micron.com,
	Iwo.Mergler@netcommwireless.com, jlauruhn@micron.com,
	linux-mtd@lists.infradead.org, dwmw2@infradead.org

On Fri, 2015-09-18 at 10:54 +0300, Artem Bityutskiy wrote:
> Real HW testing requires a real power cycle, no guarantees power cut
> happens at the right moment, so you may spend hours emulating just
> one
> paired-page case.

Sorry, I meant reproducing just one paired-page case.

-- 
Best Regards,
Artem Bityutskiy
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-18  7:54         ` Artem Bityutskiy
  2015-09-18  7:57           ` Bityutskiy, Artem
@ 2015-09-18  9:38           ` Andrea Scian
  2015-09-24  1:57             ` Karl Zhang 张双锣 (karlzhang)
  1 sibling, 1 reply; 43+ messages in thread
From: Andrea Scian @ 2015-09-18  9:38 UTC (permalink / raw)
  To: dedekind1, Richard Weinberger, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris,
	Qi Wang 王起 "(qiwang)", Iwo Mergler,
	Jeff Lauruhn (jlauruhn)


Boris, Artem,

thanks to both of you for you detailed description.
I'll follow this development, for sure I'll learn a lot :-)

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

Il 18/09/2015 09:54, Artem Bityutskiy ha scritto:
> Hi Andrea,
>
> On Fri, 2015-09-18 at 09:17 +0200, Andrea Scian wrote:
>> I perfectly understand the reason why using nandsim (and powercut
>> simulator in general) but, AFAIK, the powercut problem is hard to
>> "simulate" because the main issue is when the device see a loss of
>> power
>> in the middle of an operation (page write or block erase)
>
> This is right, and no doubts real power cuts testing is the most
> important thing.
>
> However, at the beginning, it is very hard to develop if you do not
> have a quick way to verify your ideas. Simulation is exactly for this -
> to make the first reliable draft. Once that work, you go to the second
> stage - real HW testing.
>
> Real HW testing requires a real power cycle, no guarantees power cut
> happens at the right moment, so you may spend hours emulating just one
> paired-page case. Compare this to just running a script, and it
> emulates you 100 paired-page cases during 10 minutes. And you can
> emulate it easily at the interesting places, not just during the main
> data writes.
>
> So, to recap, I suggest emulation to make the first draft, and then
> start heavy real testing to shape the final solution.
>
> Artem.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-18  9:38           ` Andrea Scian
@ 2015-09-24  1:57             ` Karl Zhang 张双锣 (karlzhang)
  2015-09-24  6:31               ` Richard Weinberger
  2015-09-24  7:43               ` Boris Brezillon
  0 siblings, 2 replies; 43+ messages in thread
From: Karl Zhang 张双锣 (karlzhang) @ 2015-09-24  1:57 UTC (permalink / raw)
  To: Andrea Scian, dedekind1@gmail.com, Richard Weinberger,
	Boris Brezillon
  Cc: Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org,
	Brian Norris, David Woodhouse, shuangshuo@gmail.com

Hello

Actually, we are working on the paired pages problem too. We have on MLC 
chips and developed a hardware power control board to simulate the real power loss cycle.

We have many ideas to share with you.


1. emulating the paired-page case
	HW: Develop a power control daughter board to control the power supply to NAND, including voltage/ramp control.
	SW: Add a module in NAND controller, utilize power board to shut down NAND power when programming paired upper page.
		
 This is easy for us to reproduce paired-page case, in order to guarantee the power loss moment, we add use FPGA logic 
to control the power board and detect the status of NAND.


2. EC/VID header corruption
	As Boris's excellent summary mentioned, "duplicate the EC header in the VID header", I also believe this is a good 
	solution to protect EC, and we are doing this and testing on MLC.
	
	For VID header, I think skip pages will waste too many capacity, and SLC mode conjugation with GC will make PE cycling higher.

	We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not 
	use OOB and ECC code always not use all OOB area. 
	


We are still developing and testing these solutions to protect EC and VID on MLC.
		

All the above is my limited work on paired pages, and I am open to any new suggestions and cooperation.



-----Original Message-----
From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Andrea Scian
Sent: Friday, September 18, 2015 5:38 PM
To: dedekind1@gmail.com; Richard Weinberger; Boris Brezillon
Cc: Iwo Mergler; Jeff Lauruhn (jlauruhn); Qi Wang 王起 "(qiwang)"; linux-mtd@lists.infradead.org; Brian Norris; David Woodhouse
Subject: Re: UBI/UBIFS: dealing with MLC's paired pages


Boris, Artem,

thanks to both of you for you detailed description.
I'll follow this development, for sure I'll learn a lot :-)

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

Il 18/09/2015 09:54, Artem Bityutskiy ha scritto:
> Hi Andrea,
>
> On Fri, 2015-09-18 at 09:17 +0200, Andrea Scian wrote:
>> I perfectly understand the reason why using nandsim (and powercut 
>> simulator in general) but, AFAIK, the powercut problem is hard to 
>> "simulate" because the main issue is when the device see a loss of 
>> power in the middle of an operation (page write or block erase)
>
> This is right, and no doubts real power cuts testing is the most 
> important thing.
>
> However, at the beginning, it is very hard to develop if you do not 
> have a quick way to verify your ideas. Simulation is exactly for this 
> - to make the first reliable draft. Once that work, you go to the 
> second stage - real HW testing.
>
> Real HW testing requires a real power cycle, no guarantees power cut 
> happens at the right moment, so you may spend hours emulating just one 
> paired-page case. Compare this to just running a script, and it 
> emulates you 100 paired-page cases during 10 minutes. And you can 
> emulate it easily at the interesting places, not just during the main 
> data writes.
>
> So, to recap, I suggest emulation to make the first draft, and then 
> start heavy real testing to shape the final solution.
>
> Artem.
>


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-24  1:57             ` Karl Zhang 张双锣 (karlzhang)
@ 2015-09-24  6:31               ` Richard Weinberger
  2015-09-24  7:43               ` Boris Brezillon
  1 sibling, 0 replies; 43+ messages in thread
From: Richard Weinberger @ 2015-09-24  6:31 UTC (permalink / raw)
  To: Karl Zhang 张双锣 (karlzhang), Andrea Scian,
	dedekind1@gmail.com, Boris Brezillon
  Cc: Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org,
	Brian Norris, David Woodhouse, shuangshuo@gmail.com

Hi!

Am 24.09.2015 um 03:57 schrieb Karl Zhang 张双锣 (karlzhang):
> 	We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not 
> 	use OOB and ECC code always not use all OOB area. 

sorry, I really detest this idea. Not using the OOB area is one of the design principles behind UBI.
We've have learned from JFFS and YAFFS that using OOB is problematic.
I'd give it up only if nothing else is applicable.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-24  1:57             ` Karl Zhang 张双锣 (karlzhang)
  2015-09-24  6:31               ` Richard Weinberger
@ 2015-09-24  7:43               ` Boris Brezillon
  2015-09-24  9:44                 ` Stefan Roese
  1 sibling, 1 reply; 43+ messages in thread
From: Boris Brezillon @ 2015-09-24  7:43 UTC (permalink / raw)
  To: Karl Zhang 张双锣 (karlzhang)
  Cc: Andrea Scian, dedekind1@gmail.com, Richard Weinberger,
	Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org,
	Brian Norris, David Woodhouse, shuangshuo@gmail.com

Hi Karl,

On Thu, 24 Sep 2015 01:57:41 +0000
Karl Zhang 张双锣 (karlzhang) <karlzhang@micron.com> wrote:

> Hello
> 
> Actually, we are working on the paired pages problem too. We have on MLC 
> chips and developed a hardware power control board to simulate the real power loss cycle.
> 
> We have many ideas to share with you.
> 
> 
> 1. emulating the paired-page case
> 	HW: Develop a power control daughter board to control the power supply to NAND, including voltage/ramp control.
> 	SW: Add a module in NAND controller, utilize power board to shut down NAND power when programming paired upper page.

Even the SW solution sounds like a HW solution to me :-).
When I say SW emulation, I mean something that doesn't require a reboot
or power-off operation.

> 		
>  This is easy for us to reproduce paired-page case, in order to guarantee the power loss moment, we add use FPGA logic 
> to control the power board and detect the status of NAND.

That's not so easy for me ;-).
Anyway, as I answered to Andrea, testing on real HW is definitely
necessary, but doing it while evaluating the different options is not as
efficient as emulating paired pages behavior.

> 
> 
> 2. EC/VID header corruption
> 	As Boris's excellent summary mentioned, "duplicate the EC header in the VID header", I also believe this is a good 
> 	solution to protect EC, and we are doing this and testing on MLC.
> 	
> 	For VID header, I think skip pages will waste too many capacity, and SLC mode conjugation with GC will make PE cycling higher.
> 
> 	We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not 
> 	use OOB and ECC code always not use all OOB area. 
> 

Hm, using the OOB area to do that is not such a good idea IMO, and I see
at least 2 reasons:

1/ You're supposing that you'll always have enough space to store the
VID info (the header is currently taking 64 bytes, even if we could
compress it by removing the padding), and this is not necessarily true
(particularly with some NAND controllers which are allowing as much
space as possible for ECC bytes).

2/ Most of the time ECC bytes are not protected (and even if some
controllers are able to protect a few of them, we're not sure to have 64
bytes of protected OOB bytes per page), which means you're writing
something that can be corrupted by bitflips. I know that the header is
protected by a CRC, but that won't help recovering the data, just let
you know when the header is corruption.
To summarize, you'll have to duplicate the VID info in all pages, and
you're not guaranteed to have a valid one (even if, the more pages you
write the less chance you have to get all of them corrupted)

Regarding the fact that you'll have to at least loose/reserve one page
to protect the VID info, I actually had another (crazy?) idea (which I
didn't expose in my previous mail, because I thought it would be a major
change in the UBI design):
How about completely getting rid of the VID header and relying on the
fastmap database (not the current one, but something close to it) + a
UBI journal logging the different changes (PEB erase, LEB map, ...).
This way we should be able to recover all the information (even the EC
ones) even after a power cut. Of course, still remains the problem of
the fastmap volume corruption.

> 
> 
> We are still developing and testing these solutions to protect EC and VID on MLC.

Okay, let us know about the results. Also, can you share the code
publicly, or is this something you want to keep private until you have
a stable version?

> 		
> 
> All the above is my limited work on paired pages, and I am open to any new suggestions and cooperation.

Thanks for sharing your ideas.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-24  7:43               ` Boris Brezillon
@ 2015-09-24  9:44                 ` Stefan Roese
  0 siblings, 0 replies; 43+ messages in thread
From: Stefan Roese @ 2015-09-24  9:44 UTC (permalink / raw)
  To: Boris Brezillon, Karl Zhang 张双锣 (karlzhang)
  Cc: Iwo Mergler, Jeff Lauruhn (jlauruhn), dedekind1@gmail.com,
	Richard Weinberger, shuangshuo@gmail.com, Andrea Scian,
	Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org,
	Brian Norris, David Woodhouse

Hi,

On 24.09.2015 09:43, Boris Brezillon wrote:
>> 2. EC/VID header corruption
>> 	As Boris's excellent summary mentioned, "duplicate the EC header in the VID header", I also believe this is a good
>> 	solution to protect EC, and we are doing this and testing on MLC.
>> 	
>> 	For VID header, I think skip pages will waste too many capacity, and SLC mode conjugation with GC will make PE cycling higher.
>>
>> 	We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not
>> 	use OOB and ECC code always not use all OOB area.
>>
>
> Hm, using the OOB area to do that is not such a good idea IMO, and I see
> at least 2 reasons:
>
> 1/ You're supposing that you'll always have enough space to store the
> VID info (the header is currently taking 64 bytes, even if we could
> compress it by removing the padding), and this is not necessarily true
> (particularly with some NAND controllers which are allowing as much
> space as possible for ECC bytes).
>
> 2/ Most of the time ECC bytes are not protected (and even if some
> controllers are able to protect a few of them, we're not sure to have 64
> bytes of protected OOB bytes per page), which means you're writing
> something that can be corrupted by bitflips. I know that the header is
> protected by a CRC, but that won't help recovering the data, just let
> you know when the header is corruption.

And 3:
Only NAND provides an OOB area. Other flash devices like parallel
or SPI NOR don't. And we definitely want to continue supporting
platforms with such flash devices and UBI (and UBIFS).

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
       [not found] <A765B125120D1346A63912DDE6D8B6310BF4CAA8@NTXXIAMBX02.xacn.micron.com>
@ 2015-09-25  7:30 ` Boris Brezillon
  2015-09-25  8:25   ` Bean Huo 霍斌斌 (beanhuo)
  2015-09-25  8:30   ` Karl Zhang 张双锣 (karlzhang)
  0 siblings, 2 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-09-25  7:30 UTC (permalink / raw)
  To: Bean Huo 霍斌斌 (beanhuo)
  Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com,
	Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com,
	rnd4@dave-tech.it, dwmw2@infradead.org

Hi,

On Fri, 25 Sep 2015 00:43:37 +0000
Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote:

> > And 3:
> > Only NAND provides an OOB area. Other flash devices like parallel
> > or SPI NOR don't. And we definitely want to continue supporting
> > platforms with such flash devices and UBI (and UBIFS).
> > 
> > Thanks,
> > Stefan
> > 
> For MLC NAND paired pages issue, we have developed two methods to solve it in UBI layer,
> We hope that every expert on UBI/UBIFS can give more suggestions about how to improve and perfect it.
> I think ,this week, I can submit first solution patch out. Currently do coding style.

Are you referring to the suggestion proposed by Karl (using the OOB
area to store UBI metadata), or is this something else?

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-25  7:30 ` Boris Brezillon
@ 2015-09-25  8:25   ` Bean Huo 霍斌斌 (beanhuo)
  2015-09-25  8:35     ` Richard Weinberger
  2015-09-25  8:48     ` Boris Brezillon
  2015-09-25  8:30   ` Karl Zhang 张双锣 (karlzhang)
  1 sibling, 2 replies; 43+ messages in thread
From: Bean Huo 霍斌斌 (beanhuo) @ 2015-09-25  8:25 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com,
	Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com,
	rnd4@dave-tech.it, dwmw2@infradead.org,
	Karl Zhang 张双锣 (karlzhang)

> Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote:
> 
> > > And 3:
> > > Only NAND provides an OOB area. Other flash devices like parallel or
> > > SPI NOR don't. And we definitely want to continue supporting
> > > platforms with such flash devices and UBI (and UBIFS).
> > >
> > > Thanks,
> > > Stefan
> > >
> > For MLC NAND paired pages issue, we have developed two methods to
> > solve it in UBI layer, We hope that every expert on UBI/UBIFS can give more
> suggestions about how to improve and perfect it.
> > I think ,this week, I can submit first solution patch out. Currently do coding
> style.
> 
> Are you referring to the suggestion proposed by Karl (using the OOB area to
> store UBI metadata), or is this something else?
> 
> Best Regards,
> 
> Boris
> 
> --
> Boris Brezillon, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com

Hi, Boris
For NAND OOB area, it is dedicated for ECC value, at the same time,
User available area in OOB is not covered by ECC protection mechanism.
so saving EC or VID information in OOB is not a perfect solution.
But if there is additional space, and this space can be covered by ECC,
We can try it.
By the way, I want to allocate a new internal volume to solve paired pages issue.
How do you think about this ?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-25  7:30 ` Boris Brezillon
  2015-09-25  8:25   ` Bean Huo 霍斌斌 (beanhuo)
@ 2015-09-25  8:30   ` Karl Zhang 张双锣 (karlzhang)
  2015-09-25  8:56     ` Boris Brezillon
  1 sibling, 1 reply; 43+ messages in thread
From: Karl Zhang 张双锣 (karlzhang) @ 2015-09-25  8:30 UTC (permalink / raw)
  To: Boris Brezillon, Bean Huo 霍斌斌 (beanhuo)
  Cc: Iwo Mergler, computersforpeace@gmail.com, dedekind1@gmail.com,
	richard@nod.at, shuangshuo@gmail.com, rnd4@dave-tech.it,
	Jeff Lauruhn (jlauruhn), linux-mtd@lists.infradead.org,
	Stefan Roese, dwmw2@infradead.org

Hi Boris

Something else suggestions. 

We know that using the OOB area to store UBI metadata(Backup data) is not a good idea, this is just a thinking, and also do a little work to implement it and verification .

For paired page issue from MLC is troublesome, we are trying our best to deal with it. Although attempt to use some bad ideas.

> > Only NAND provides an OOB area. 
Some devices(NOR) do not have OOB, simultaneously, they do not have paired page issue, right?  I think they do not need to store a redundant metadata to anywhere.
AFAIK , only MLC(TLC maybe) NAND need store some metadata for another copy, and try to protected it by ECC.

But, if OOB have some unused area (at least 48 bytes) and the chip has paired page issue , could we take a little advantage of them  to reduce UBI fail rate? 

Above, is just some thinking , wish we can have some better solutions. 
As Bean said, most customer do not want UBIFS crash, and so do we.

Thanks to your patient to point out my wrong opinion.


BR



-----Original Message-----
From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Boris Brezillon
Sent: Friday, September 25, 2015 3:30 PM
To: Bean Huo 霍斌斌 (beanhuo)
Cc: Iwo Mergler; computersforpeace@gmail.com; dedekind1@gmail.com; richard@nod.at; shuangshuo@gmail.com; rnd4@dave-tech.it; Jeff Lauruhn (jlauruhn); linux-mtd@lists.infradead.org; Stefan Roese; dwmw2@infradead.org
Subject: Re: UBI/UBIFS: dealing with MLC's paired pages

Hi,

On Fri, 25 Sep 2015 00:43:37 +0000
Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote:

> > And 3:
> > Only NAND provides an OOB area. Other flash devices like parallel or 
> > SPI NOR don't. And we definitely want to continue supporting 
> > platforms with such flash devices and UBI (and UBIFS).
> > 
> > Thanks,
> > Stefan
> > 
> For MLC NAND paired pages issue, we have developed two methods to 
> solve it in UBI layer, We hope that every expert on UBI/UBIFS can give more suggestions about how to improve and perfect it.
> I think ,this week, I can submit first solution patch out. Currently do coding style.

Are you referring to the suggestion proposed by Karl (using the OOB area to store UBI metadata), or is this something else?

Best Regards,

Boris

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-25  8:25   ` Bean Huo 霍斌斌 (beanhuo)
@ 2015-09-25  8:35     ` Richard Weinberger
  2015-09-25  8:48     ` Boris Brezillon
  1 sibling, 0 replies; 43+ messages in thread
From: Richard Weinberger @ 2015-09-25  8:35 UTC (permalink / raw)
  To: Bean Huo 霍斌斌 (beanhuo), Boris Brezillon
  Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com,
	Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	dedekind1@gmail.com, shuangshuo@gmail.com, rnd4@dave-tech.it,
	dwmw2@infradead.org,
	Karl Zhang 张双锣 (karlzhang)

Am 25.09.2015 um 10:25 schrieb Bean Huo 霍斌斌 (beanhuo):
> For NAND OOB area, it is dedicated for ECC value, at the same time,
> User available area in OOB is not covered by ECC protection mechanism.
> so saving EC or VID information in OOB is not a perfect solution.
> But if there is additional space, and this space can be covered by ECC,
> We can try it.

As I said, I'm *really* against using OOB and see it as the absolutely last choice.

> By the way, I want to allocate a new internal volume to solve paired pages issue.
> How do you think about this ?

More details please. :)
Adding a new internal volume to UBI is not a big deal.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-25  8:25   ` Bean Huo 霍斌斌 (beanhuo)
  2015-09-25  8:35     ` Richard Weinberger
@ 2015-09-25  8:48     ` Boris Brezillon
  1 sibling, 0 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-09-25  8:48 UTC (permalink / raw)
  To: Bean Huo 霍斌斌 (beanhuo)
  Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com,
	Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com,
	rnd4@dave-tech.it, dwmw2@infradead.org,
	Karl Zhang 张双锣 (karlzhang)

Hi Bean,

On Fri, 25 Sep 2015 08:25:44 +0000
Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote:

> > Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote:
> > 
> > > > And 3:
> > > > Only NAND provides an OOB area. Other flash devices like parallel or
> > > > SPI NOR don't. And we definitely want to continue supporting
> > > > platforms with such flash devices and UBI (and UBIFS).
> > > >
> > > > Thanks,
> > > > Stefan
> > > >
> > > For MLC NAND paired pages issue, we have developed two methods to
> > > solve it in UBI layer, We hope that every expert on UBI/UBIFS can give more
> > suggestions about how to improve and perfect it.
> > > I think ,this week, I can submit first solution patch out. Currently do coding
> > style.
> > 
> > Are you referring to the suggestion proposed by Karl (using the OOB area to
> > store UBI metadata), or is this something else?
> > 
> > Best Regards,
> > 
> > Boris
> > 
> > --
> > Boris Brezillon, Free Electrons
> > Embedded Linux and Kernel engineering
> > http://free-electrons.com
> 
> Hi, Boris
> For NAND OOB area, it is dedicated for ECC value, at the same time,
> User available area in OOB is not covered by ECC protection mechanism.
> so saving EC or VID information in OOB is not a perfect solution.
> But if there is additional space, and this space can be covered by ECC,
> We can try it.

The problem is, we want the solution to be as generic as possible, and
not all NAND/ECC controllers are able to protect OOB bytes :-/.

> By the way, I want to allocate a new internal volume to solve paired pages issue.
> How do you think about this ?

I actually had a similar idea, but instead of creating a new metadata
volume, I wanted to reuse the fastmap one.
My idea was not to duplicate data from already programmed pages in
this volume (not sure this was your idea either, could you tell us
more about what you had in mind?), but instead use it to log UBI
operations like
- PEB erasure: to save the EC counter and recover it if a power-cut
  occurs after the erasure but before the EC header is written to the
  block. Doing that would also partly solve the 'unstable bits' issue,
  since we would be able to know which block was being erased before
  the power-cut occured.
- LEB map: to save the lnum <-> pnum association and recover from VID
  header corruption.
- Last written block (still not happy with that): to log on which block
  the last write operation has taken place. This would help solving the
  'unstable bits' problem, but it would also add a non negligible
  overhead if writes are not consecutive (not done in the same LEB)

Other ideas are welcome.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-25  8:30   ` Karl Zhang 张双锣 (karlzhang)
@ 2015-09-25  8:56     ` Boris Brezillon
  0 siblings, 0 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-09-25  8:56 UTC (permalink / raw)
  To: Karl Zhang 张双锣 (karlzhang)
  Cc: Bean Huo 霍斌斌 (beanhuo), Iwo Mergler,
	computersforpeace@gmail.com, dedekind1@gmail.com, richard@nod.at,
	shuangshuo@gmail.com, rnd4@dave-tech.it, Jeff Lauruhn (jlauruhn),
	linux-mtd@lists.infradead.org, Stefan Roese, dwmw2@infradead.org

On Fri, 25 Sep 2015 08:30:23 +0000
Karl Zhang 张双锣 (karlzhang) <karlzhang@micron.com> wrote:

> Hi Boris
> 
> Something else suggestions. 
> 
> We know that using the OOB area to store UBI metadata(Backup data) is not a good idea, this is just a thinking, and also do a little work to implement it and verification .
> 
> For paired page issue from MLC is troublesome, we are trying our best to deal with it. Although attempt to use some bad ideas.
> 
> > > Only NAND provides an OOB area. 
> Some devices(NOR) do not have OOB, simultaneously, they do not have paired page issue, right?  I think they do not need to store a redundant metadata to anywhere.
> AFAIK , only MLC(TLC maybe) NAND need store some metadata for another copy, and try to protected it by ECC.
> 
> But, if OOB have some unused area (at least 48 bytes) and the chip has paired page issue , could we take a little advantage of them  to reduce UBI fail rate? 
> 
> Above, is just some thinking , wish we can have some better solutions. 
> As Bean said, most customer do not want UBIFS crash, and so do we.
> 
> Thanks to your patient to point out my wrong opinion.

Don't be sorry, and there's no wrong opinion: the whole point of this
discussion is sharing our ideas and arguing to find the best solution.

Regarding the use of extra OOB bytes, I'll follow Richard's opinion:
let's keep it as a 'last resort' solution if we fail to find an
acceptable alternative.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon
  2015-09-17 15:20 ` Artem Bityutskiy
@ 2015-09-29 11:19 ` Richard Weinberger
  2015-09-29 12:51   ` Boris Brezillon
  2015-10-23  8:14 ` Boris Brezillon
  2015-10-28 12:06 ` Artem Bityutskiy
  3 siblings, 1 reply; 43+ messages in thread
From: Richard Weinberger @ 2015-09-29 11:19 UTC (permalink / raw)
  To: Boris Brezillon, Artem Bityutskiy
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Qi Wang 王起 (qiwang), Iwo Mergler,
	Jeff Lauruhn (jlauruhn)

Hi!

Am 17.09.2015 um 15:22 schrieb Boris Brezillon:
> Hello,
> 
> I'm currently working on the paired pages problem we have on MLC chips.
> I remember discussing it with Artem earlier this year when I was
> preparing my talk for ELC.
> 
> I now have some time I can spend working on this problem and I started
> looking at how this can be solved.
> 
> First let's take a look at the UBI layer.
> There's one basic thing we have to care about: protecting UBI metadata.
> There are two kind of metadata:
> 1/ those stored at the beginning of each erase block (EC and VID
>    headers)
> 2/ those stored in specific volumes (layout and fastmap volumes)
> 
> We don't have to worry about #2 since those are written using atomic
> update, and atomic updates are immune to this paired page corruption
> problem (either the whole write is valid, or none of it is valid).
> 
> This leaves problem #1.
> For this case, Artem suggested to duplicate the EC header in the VID
> header so that if page 0 is corrupted we can recover the EC info from
> page 1 (which will contain both VID and EC info).
> Doing that is fine for dealing with EC header corruption, since, AFAIK,
> none of the NAND vendors are pairing page 0 with page 1.
> Still remains the VID header corruption problem. Do prevent that we
> still have several solutions:
> a/ skip the page paired with the VID header. This is doable and can be
>    hidden from UBI users, but it also means that we're loosing another
>    page for metadata (not a negligible overhead)
> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
>    seems the right place to put that in, since fastmap is already
>    storing those information for almost all blocks. Still we would have
>    to modify fastmap a bit to store information about all erase blocks
>    and not only those that are not part of the fastmap pool.
>    Also, updating that in real-time would require using a log approach,
>    instead of the atomic update currently used by fastmap when it runs
>    out of PEBs in it's free PEB pool. Note that the log approach does
>    not have to be applied to all fastmap data (we just need it for the
>    PEB <-> LEB info).
>    Another off-topic note regarding the suggested log approach: we
>    could also use it to log which PEB was last written/erased, and use
>    that to handle the unstable bits issue.
> c/ (also suggested by Artem) delay VID write until we have enough data
>    to write on the LEB, and thus guarantee that it cannot be corrupted
>    (at least by programming on the paired page ;-)) anymore.
>    Doing that would also require logging data to be written on those
>    LEBs somewhere, not to mention the impact of copying the data twice
>    (once in the log, and then when we have enough data, in the real
>    block).

Let's start with UBI, as soon it is stable on MLC NAND we can focus on
UBIFS.

Solution a) sounds very promising to me as the can be implemented easily
and loosing another page for meta data is IMHO acceptable on MLC.
Especially as MLC NANDs are anyways bigger and cheaper than SLC.

b) is tricky as fastmap follows the design principle that UBI can fall
back to a full scan if the fastmap is corrupted or a self check fails.
If the ability to full scan suddenly depends on fastmap it can become
messy.

In terms of computer science c) is the most elegant solution but converting
UBI to a log based "block layer" is not trivial and as you wrote the write
overhead is not negligible.

So, I'd vote for a) and see how well it does in our powercut tests. :-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-29 11:19 ` Richard Weinberger
@ 2015-09-29 12:51   ` Boris Brezillon
  0 siblings, 0 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-09-29 12:51 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Qi Wang 王起 (qiwang), Iwo Mergler,
	Jeff Lauruhn (jlauruhn)

Hi Richard,

On Tue, 29 Sep 2015 13:19:01 +0200
Richard Weinberger <richard@nod.at> wrote:

> Hi!
> 
> Am 17.09.2015 um 15:22 schrieb Boris Brezillon:
> > Hello,
> > 
> > I'm currently working on the paired pages problem we have on MLC chips.
> > I remember discussing it with Artem earlier this year when I was
> > preparing my talk for ELC.
> > 
> > I now have some time I can spend working on this problem and I started
> > looking at how this can be solved.
> > 
> > First let's take a look at the UBI layer.
> > There's one basic thing we have to care about: protecting UBI metadata.
> > There are two kind of metadata:
> > 1/ those stored at the beginning of each erase block (EC and VID
> >    headers)
> > 2/ those stored in specific volumes (layout and fastmap volumes)
> > 
> > We don't have to worry about #2 since those are written using atomic
> > update, and atomic updates are immune to this paired page corruption
> > problem (either the whole write is valid, or none of it is valid).
> > 
> > This leaves problem #1.
> > For this case, Artem suggested to duplicate the EC header in the VID
> > header so that if page 0 is corrupted we can recover the EC info from
> > page 1 (which will contain both VID and EC info).
> > Doing that is fine for dealing with EC header corruption, since, AFAIK,
> > none of the NAND vendors are pairing page 0 with page 1.
> > Still remains the VID header corruption problem. Do prevent that we
> > still have several solutions:
> > a/ skip the page paired with the VID header. This is doable and can be
> >    hidden from UBI users, but it also means that we're loosing another
> >    page for metadata (not a negligible overhead)
> > b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
> >    seems the right place to put that in, since fastmap is already
> >    storing those information for almost all blocks. Still we would have
> >    to modify fastmap a bit to store information about all erase blocks
> >    and not only those that are not part of the fastmap pool.
> >    Also, updating that in real-time would require using a log approach,
> >    instead of the atomic update currently used by fastmap when it runs
> >    out of PEBs in it's free PEB pool. Note that the log approach does
> >    not have to be applied to all fastmap data (we just need it for the
> >    PEB <-> LEB info).
> >    Another off-topic note regarding the suggested log approach: we
> >    could also use it to log which PEB was last written/erased, and use
> >    that to handle the unstable bits issue.
> > c/ (also suggested by Artem) delay VID write until we have enough data
> >    to write on the LEB, and thus guarantee that it cannot be corrupted
> >    (at least by programming on the paired page ;-)) anymore.
> >    Doing that would also require logging data to be written on those
> >    LEBs somewhere, not to mention the impact of copying the data twice
> >    (once in the log, and then when we have enough data, in the real
> >    block).
> 
> Let's start with UBI, as soon it is stable on MLC NAND we can focus on
> UBIFS.

I wish it was that simple, but the decision we take at the UBI layer
will most likely impact the choices we'll have at the UBIFS layer.
So yes, focusing on the UBI layer for the implementation sounds
sensible, but I think we have to carefully think about the solution we
want to test first, and what are the impact on the UBIFS implementation.

> 
> Solution a) sounds very promising to me as the can be implemented easily
> and loosing another page for meta data is IMHO acceptable on MLC.
> Especially as MLC NANDs are anyways bigger and cheaper than SLC.

Yes, solution a) is definitely the simplest one (and probably the one
I'll try first). Regarding the overhead, we go from
2/number_of_pages_per_block to 4/number_of_pages_per_block (not counting
the overhead of internal volumes, since they should be pretty much the
same in both cases), so wouldn't say it's negligible even for MLCs. But
I agree that having a reliable solution at the cost of more overhead
can be a good match for our first implementation.

> 
> b) is tricky as fastmap follows the design principle that UBI can fall
> back to a full scan if the fastmap is corrupted or a self check fails.
> If the ability to full scan suddenly depends on fastmap it can become
> messy.

We are only talking about paired pages corruption here, so I hope both
information will not be corrupted at the same time: VID header should
be valid unless a power-cut occurred while writing on the page paired
with the VID header one, which means fastmap volume should still be
valid (unless we are experiencing data-retention issues, which can be
true for the SLC case too). And even if the LEB and fastmap information
are corrupted, we should be able to reconstruct it and discard the LEB
with the corrupted VID header.
Anyway, this approach is way more complicated to implement, and I
reserve it as a "going further" topic ;-).

> 
> In terms of computer science c) is the most elegant solution but converting
> UBI to a log based "block layer" is not trivial and as you wrote the write
> overhead is not negligible.

Hm, I don't know if it's the most elegant solution (I'm still concerned
by the write overhead caused by the extra copy step, though solution b)
as some overhead too), but I agree that implementing that one is not
trivial.

> 
> So, I'd vote for a) and see how well it does in our powercut tests. :-)

a) it is. I'll focus on that solution first.

Thanks for the advices.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon
  2015-09-17 15:20 ` Artem Bityutskiy
  2015-09-29 11:19 ` Richard Weinberger
@ 2015-10-23  8:14 ` Boris Brezillon
  2015-10-27 20:16   ` Richard Weinberger
  2015-10-28 12:24   ` Artem Bityutskiy
  2015-10-28 12:06 ` Artem Bityutskiy
  3 siblings, 2 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-10-23  8:14 UTC (permalink / raw)
  To: Artem Bityutskiy, Richard Weinberger
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 (beanhuo)

Hi,

Here is a quick status update of my progress and a few questions to
UBI/UBIFS experts.

On Thu, 17 Sep 2015 15:22:40 +0200
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> Hello,
> 
> I'm currently working on the paired pages problem we have on MLC chips.
> I remember discussing it with Artem earlier this year when I was
> preparing my talk for ELC.
> 
> I now have some time I can spend working on this problem and I started
> looking at how this can be solved.
> 
> First let's take a look at the UBI layer.
> There's one basic thing we have to care about: protecting UBI metadata.
> There are two kind of metadata:
> 1/ those stored at the beginning of each erase block (EC and VID
>    headers)
> 2/ those stored in specific volumes (layout and fastmap volumes)
> 
> We don't have to worry about #2 since those are written using atomic
> update, and atomic updates are immune to this paired page corruption
> problem (either the whole write is valid, or none of it is valid).
> 
> This leaves problem #1.
> For this case, Artem suggested to duplicate the EC header in the VID
> header so that if page 0 is corrupted we can recover the EC info from
> page 1 (which will contain both VID and EC info).
> Doing that is fine for dealing with EC header corruption, since, AFAIK,
> none of the NAND vendors are pairing page 0 with page 1.
> Still remains the VID header corruption problem. Do prevent that we
> still have several solutions:
> a/ skip the page paired with the VID header. This is doable and can be
>    hidden from UBI users, but it also means that we're loosing another
>    page for metadata (not a negligible overhead)
> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
>    seems the right place to put that in, since fastmap is already
>    storing those information for almost all blocks. Still we would have
>    to modify fastmap a bit to store information about all erase blocks
>    and not only those that are not part of the fastmap pool.
>    Also, updating that in real-time would require using a log approach,
>    instead of the atomic update currently used by fastmap when it runs
>    out of PEBs in it's free PEB pool. Note that the log approach does
>    not have to be applied to all fastmap data (we just need it for the
>    PEB <-> LEB info).
>    Another off-topic note regarding the suggested log approach: we
>    could also use it to log which PEB was last written/erased, and use
>    that to handle the unstable bits issue.
> c/ (also suggested by Artem) delay VID write until we have enough data
>    to write on the LEB, and thus guarantee that it cannot be corrupted
>    (at least by programming on the paired page ;-)) anymore.
>    Doing that would also require logging data to be written on those
>    LEBs somewhere, not to mention the impact of copying the data twice
>    (once in the log, and then when we have enough data, in the real
>    block).
> 
> I don't have any strong opinion about which solution is the best, also
> I'm maybe missing other aspects or better solutions, so feel free to
> comment on that and share your thoughts.

I decided to go for the simplest solution (but I can't promise I won't
change my mind if this approach appears to be wrong), which is either
using a LEB is MLC or SLC mode. In SLC modes, only the first page of
each pair is used, which completely address the paired pages problem.
For now the SLC mode logic is hidden in the MTD/NAND layers which are
providing functions to write/read in SLC mode.

Thanks to this differentiation, UBI is now exposing two kind of LEBs:
- the secure (small) LEBS (those accessed in SLC mode)
- the unsecure (big) LEBS (those accessed in MLC mode)

The secure LEBs are marked as such with a flag in the VID header, which
allows tracking secure/unsecure LEBs and controlling the maximum size a
UBI user can read/write from/to a LEB.
This approach assume LEB 0 and 1 are never paired together (which
AFAICT is always true), because VID is stored on page 1 and we need the
secure_flag information to know how to access the LEB (SLC or MLC mode).
Of course I expose a few new helpers in the kernel API, and we'll
probably have to do it for the ioctl interface too if this approach is
validated.

That's all I got for the UBI layer.
Richard, Artem, any feedback so far?

> 
> That's all for the UBI layer. We will likely need new functions (and
> new fields in existing structures) to help UBI users deal with MLC
> NANDs: for example a field exposing the storage type or a function
> helping users skip one (or several) blocks to secure the data they have
> written so far. Anyway, those are things we can discuss after deciding
> which approach we want to take.
> 
> Now, let's talk about the UBIFS layer. We are facing pretty much the
> same problem in there: we need to protect the data we have already
> written from time to time.
> AFAIU (correct me if I'm wrong), data should be secure when we sync the
> file system, or commit the UBIFS journal (feel free to correct me if
> I'm not using the right terms in my explanation).
> As explained earlier, the only way to secure data is to skip some pages
> (those that are paired with the already written ones).
> 
> I see two approaches here (there might be more):
> 1/ do not skip any pages until we are asked to secure the data, and
>    then skip as much pages as needed to ensure nobody can ever corrupt
>    the data. With this approach you can loose a non negligible amount
>    of space. For example, with this paired pages scheme [1], if you
>    only write page on page 2 and want to secure your data, you'll have
>    to skip pages 3 to 8.
> 2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a
>    block). With this solution you always loose half the NAND capacity,
>    but in case of small writes, it's still more efficient than #1.
>    Of course using that solution is not acceptable, because you'll
>    only be able to use half the NAND capacity, but the plan is to use
>    it in conjunction with the GC, so that from time to time UBIFS
>    data chunks/nodes can be put in a single erase block without
>    skipping half the pages.
>    Note that currently the GC does not work this way: it tries to
>    collect chunks one by one and write them to the journal to free a
>    dirty LEB. What we would need here is a way to collect enough data
>    to fill an entire block and after that release the LEBs that where
>    previously using half the LEB capacity.
> 
> Of course both of those solutions implies marking the skipped regions
> as dirty so that the GC can account for the padded space. For #1 we
> should probably also use padding nodes to reflect how much space is lost
> on the media, though I'm not sure how this can be done. For #2, we may
> have to differentiate 'full' and 'half' LEBs in the LPT.

If you followed my un/secure LEB approach described above, you probably
know that we don't have much solutions for the UBIFS layer.

My idea here is to use a garbage collection mechanism which will
consolidate data LEBs (LEBs containing valid data nodes).
By default all LEBs are used in secure (SLC) mode, which makes the
UBIFS layer reliable. From time to time the consolidation GC will
choose a few secure LEBs and move their nodes to an unsecure LEB.
The idea is to fill the entire unsecure LEB, so that we never write on
it afterwards, thus preventing any paired page corruption. Once this
copy is finished we can release/unmap the secure LEBs we have
consolidated (after adding a bud node to reference the unsecure LEB of
course).

Here are a few details about the implementation I started to develop
(questions will come after ;-)).
I added a new category (called LPROPS_FULL) to track the LEBs that are
almost full (lp->dirty + lp->free < leb_size / 4), so that we can
easily consolidate 2 to 3 full LEBs into a single unsecure LEB.
The consolidation is done by filling as much nodes as possible into an
unsecure LEB, and after a single pass, this should results in at least
one freed LEB freed: the consolidation moves nodes from at least 2
secure LEBs into a single one, so you're freeing 2 LEBs but need to
keep one for the next consolidation iteration, hence the single LEB
freed.

Now comes the questions to the UBIFS experts:
- should I create a new journal head to do what's described above?
  AFAICT I can't use the GC head, because the GC can still do it's job
  in parallel of the consolidation-GC, and the GC LEB might already be
  filled with some data nodes, right?
  I thought about using the data head, but again, it might already
  point to a partially filled data LEB.
  I added a journal head called BIG_DATA_HEAD, but I'm not sure this is
  acceptable, so let me know what you think about that?

- when should we run the consolidation-GC? After the standard GC
  pass, when this one didn't make any progress, or should we launch
  it as soon as we have enough full LEBs to fill an unsecure LEB? The
  second solution might have a small impact on performances of an empty
  FS (below the half capacity size), but ITOH, it will scale better when
  the FS size exceed this limit (no need to run the GC each time we
  want to write new data).

- I still need to understand the races between TNC and GC, since I'm
  pretty sure I'll face the same kind of problems with the
  consolidation-GC. Can someone explains that to me, or should I dig
  further into the code :-)?

I'm pretty sure I forgot a lot of problematics here, also note that my
implementation is not finished yet, so this consolidation-GC concept
has not been validated. If you see anything that could defeat this
approach, please let me know so that I can adjust my development.

Thanks.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-23  8:14 ` Boris Brezillon
@ 2015-10-27 20:16   ` Richard Weinberger
  2015-10-28  9:24     ` Boris Brezillon
  2015-10-28 12:24   ` Artem Bityutskiy
  1 sibling, 1 reply; 43+ messages in thread
From: Richard Weinberger @ 2015-10-27 20:16 UTC (permalink / raw)
  To: Boris Brezillon, Artem Bityutskiy
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 (beanhuo)

Boris,

Am 23.10.2015 um 10:14 schrieb Boris Brezillon:
>> I'm currently working on the paired pages problem we have on MLC chips.
>> I remember discussing it with Artem earlier this year when I was
>> preparing my talk for ELC.
>>
>> I now have some time I can spend working on this problem and I started
>> looking at how this can be solved.
>>
>> First let's take a look at the UBI layer.
>> There's one basic thing we have to care about: protecting UBI metadata.
>> There are two kind of metadata:
>> 1/ those stored at the beginning of each erase block (EC and VID
>>    headers)
>> 2/ those stored in specific volumes (layout and fastmap volumes)
>>
>> We don't have to worry about #2 since those are written using atomic
>> update, and atomic updates are immune to this paired page corruption
>> problem (either the whole write is valid, or none of it is valid).
>>
>> This leaves problem #1.
>> For this case, Artem suggested to duplicate the EC header in the VID
>> header so that if page 0 is corrupted we can recover the EC info from
>> page 1 (which will contain both VID and EC info).
>> Doing that is fine for dealing with EC header corruption, since, AFAIK,
>> none of the NAND vendors are pairing page 0 with page 1.
>> Still remains the VID header corruption problem. Do prevent that we
>> still have several solutions:
>> a/ skip the page paired with the VID header. This is doable and can be
>>    hidden from UBI users, but it also means that we're loosing another
>>    page for metadata (not a negligible overhead)
>> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
>>    seems the right place to put that in, since fastmap is already
>>    storing those information for almost all blocks. Still we would have
>>    to modify fastmap a bit to store information about all erase blocks
>>    and not only those that are not part of the fastmap pool.
>>    Also, updating that in real-time would require using a log approach,
>>    instead of the atomic update currently used by fastmap when it runs
>>    out of PEBs in it's free PEB pool. Note that the log approach does
>>    not have to be applied to all fastmap data (we just need it for the
>>    PEB <-> LEB info).
>>    Another off-topic note regarding the suggested log approach: we
>>    could also use it to log which PEB was last written/erased, and use
>>    that to handle the unstable bits issue.
>> c/ (also suggested by Artem) delay VID write until we have enough data
>>    to write on the LEB, and thus guarantee that it cannot be corrupted
>>    (at least by programming on the paired page ;-)) anymore.
>>    Doing that would also require logging data to be written on those
>>    LEBs somewhere, not to mention the impact of copying the data twice
>>    (once in the log, and then when we have enough data, in the real
>>    block).
>>
>> I don't have any strong opinion about which solution is the best, also
>> I'm maybe missing other aspects or better solutions, so feel free to
>> comment on that and share your thoughts.
> 
> I decided to go for the simplest solution (but I can't promise I won't
> change my mind if this approach appears to be wrong), which is either
> using a LEB is MLC or SLC mode. In SLC modes, only the first page of
> each pair is used, which completely address the paired pages problem.
> For now the SLC mode logic is hidden in the MTD/NAND layers which are
> providing functions to write/read in SLC mode.
> 
> Thanks to this differentiation, UBI is now exposing two kind of LEBs:
> - the secure (small) LEBS (those accessed in SLC mode)
> - the unsecure (big) LEBS (those accessed in MLC mode)
> 
> The secure LEBs are marked as such with a flag in the VID header, which
> allows tracking secure/unsecure LEBs and controlling the maximum size a
> UBI user can read/write from/to a LEB.
> This approach assume LEB 0 and 1 are never paired together (which

You mean page 0 and 1?

> AFAICT is always true), because VID is stored on page 1 and we need the
> secure_flag information to know how to access the LEB (SLC or MLC mode).
> Of course I expose a few new helpers in the kernel API, and we'll
> probably have to do it for the ioctl interface too if this approach is
> validated.
> 
> That's all I got for the UBI layer.
> Richard, Artem, any feedback so far?

Changing the on-flash format of UBI is a rather big thing.
If it needs to be done I'm fine with it but we have to give our best
to change it only once. :-)

>>
>> That's all for the UBI layer. We will likely need new functions (and
>> new fields in existing structures) to help UBI users deal with MLC
>> NANDs: for example a field exposing the storage type or a function
>> helping users skip one (or several) blocks to secure the data they have
>> written so far. Anyway, those are things we can discuss after deciding
>> which approach we want to take.
>>
>> Now, let's talk about the UBIFS layer. We are facing pretty much the
>> same problem in there: we need to protect the data we have already
>> written from time to time.
>> AFAIU (correct me if I'm wrong), data should be secure when we sync the
>> file system, or commit the UBIFS journal (feel free to correct me if
>> I'm not using the right terms in my explanation).
>> As explained earlier, the only way to secure data is to skip some pages
>> (those that are paired with the already written ones).
>>
>> I see two approaches here (there might be more):
>> 1/ do not skip any pages until we are asked to secure the data, and
>>    then skip as much pages as needed to ensure nobody can ever corrupt
>>    the data. With this approach you can loose a non negligible amount
>>    of space. For example, with this paired pages scheme [1], if you
>>    only write page on page 2 and want to secure your data, you'll have
>>    to skip pages 3 to 8.
>> 2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a
>>    block). With this solution you always loose half the NAND capacity,
>>    but in case of small writes, it's still more efficient than #1.
>>    Of course using that solution is not acceptable, because you'll
>>    only be able to use half the NAND capacity, but the plan is to use
>>    it in conjunction with the GC, so that from time to time UBIFS
>>    data chunks/nodes can be put in a single erase block without
>>    skipping half the pages.
>>    Note that currently the GC does not work this way: it tries to
>>    collect chunks one by one and write them to the journal to free a
>>    dirty LEB. What we would need here is a way to collect enough data
>>    to fill an entire block and after that release the LEBs that where
>>    previously using half the LEB capacity.
>>
>> Of course both of those solutions implies marking the skipped regions
>> as dirty so that the GC can account for the padded space. For #1 we
>> should probably also use padding nodes to reflect how much space is lost
>> on the media, though I'm not sure how this can be done. For #2, we may
>> have to differentiate 'full' and 'half' LEBs in the LPT.
> 
> If you followed my un/secure LEB approach described above, you probably
> know that we don't have much solutions for the UBIFS layer.
> 
> My idea here is to use a garbage collection mechanism which will
> consolidate data LEBs (LEBs containing valid data nodes).
> By default all LEBs are used in secure (SLC) mode, which makes the
> UBIFS layer reliable. From time to time the consolidation GC will
> choose a few secure LEBs and move their nodes to an unsecure LEB.
> The idea is to fill the entire unsecure LEB, so that we never write on
> it afterwards, thus preventing any paired page corruption. Once this
> copy is finished we can release/unmap the secure LEBs we have
> consolidated (after adding a bud node to reference the unsecure LEB of
> course).
> 
> Here are a few details about the implementation I started to develop
> (questions will come after ;-)).
> I added a new category (called LPROPS_FULL) to track the LEBs that are
> almost full (lp->dirty + lp->free < leb_size / 4), so that we can
> easily consolidate 2 to 3 full LEBs into a single unsecure LEB.
> The consolidation is done by filling as much nodes as possible into an
> unsecure LEB, and after a single pass, this should results in at least
> one freed LEB freed: the consolidation moves nodes from at least 2
> secure LEBs into a single one, so you're freeing 2 LEBs but need to
> keep one for the next consolidation iteration, hence the single LEB
> freed.
> 
> Now comes the questions to the UBIFS experts:
> - should I create a new journal head to do what's described above?
>   AFAICT I can't use the GC head, because the GC can still do it's job
>   in parallel of the consolidation-GC, and the GC LEB might already be
>   filled with some data nodes, right?
>   I thought about using the data head, but again, it might already
>   point to a partially filled data LEB.
>   I added a journal head called BIG_DATA_HEAD, but I'm not sure this is
>   acceptable, so let me know what you think about that?

I'd vote for a new head.
If it turns out to be similar enough to another head we can still
merge it to that head.

> - when should we run the consolidation-GC? After the standard GC
>   pass, when this one didn't make any progress, or should we launch
>   it as soon as we have enough full LEBs to fill an unsecure LEB? The
>   second solution might have a small impact on performances of an empty
>   FS (below the half capacity size), but ITOH, it will scale better when
>   the FS size exceed this limit (no need to run the GC each time we
>   want to write new data).

I'd go for a hybrid approach.
Run the consolidation-GC if standard GC was unable to produce free space
and if more than X small LEBs are full.

> - I still need to understand the races between TNC and GC, since I'm
>   pretty sure I'll face the same kind of problems with the
>   consolidation-GC. Can someone explains that to me, or should I dig
>   further into the code :-)?

Not sure if I understand this questions correctly.

What you need for sure is i) a way to find out whether a LEB can be packed
and ii) lock it while packing.

> I'm pretty sure I forgot a lot of problematics here, also note that my
> implementation is not finished yet, so this consolidation-GC concept
> has not been validated. If you see anything that could defeat this
> approach, please let me know so that I can adjust my development.

Please share your patches as soon as possible. Just mark them as RFC
(really flaky code). I'll happily test them on my MLC boards and review them.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-27 20:16   ` Richard Weinberger
@ 2015-10-28  9:24     ` Boris Brezillon
  2015-10-28 10:44       ` Michal Suchanek
  0 siblings, 1 reply; 43+ messages in thread
From: Boris Brezillon @ 2015-10-28  9:24 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 (beanhuo)

Hi Richard,

On Tue, 27 Oct 2015 21:16:28 +0100
Richard Weinberger <richard@nod.at> wrote:

> Boris,
> 
> Am 23.10.2015 um 10:14 schrieb Boris Brezillon:
> >> I'm currently working on the paired pages problem we have on MLC chips.
> >> I remember discussing it with Artem earlier this year when I was
> >> preparing my talk for ELC.
> >>
> >> I now have some time I can spend working on this problem and I started
> >> looking at how this can be solved.
> >>
> >> First let's take a look at the UBI layer.
> >> There's one basic thing we have to care about: protecting UBI metadata.
> >> There are two kind of metadata:
> >> 1/ those stored at the beginning of each erase block (EC and VID
> >>    headers)
> >> 2/ those stored in specific volumes (layout and fastmap volumes)
> >>
> >> We don't have to worry about #2 since those are written using atomic
> >> update, and atomic updates are immune to this paired page corruption
> >> problem (either the whole write is valid, or none of it is valid).
> >>
> >> This leaves problem #1.
> >> For this case, Artem suggested to duplicate the EC header in the VID
> >> header so that if page 0 is corrupted we can recover the EC info from
> >> page 1 (which will contain both VID and EC info).
> >> Doing that is fine for dealing with EC header corruption, since, AFAIK,
> >> none of the NAND vendors are pairing page 0 with page 1.
> >> Still remains the VID header corruption problem. Do prevent that we
> >> still have several solutions:
> >> a/ skip the page paired with the VID header. This is doable and can be
> >>    hidden from UBI users, but it also means that we're loosing another
> >>    page for metadata (not a negligible overhead)
> >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
> >>    seems the right place to put that in, since fastmap is already
> >>    storing those information for almost all blocks. Still we would have
> >>    to modify fastmap a bit to store information about all erase blocks
> >>    and not only those that are not part of the fastmap pool.
> >>    Also, updating that in real-time would require using a log approach,
> >>    instead of the atomic update currently used by fastmap when it runs
> >>    out of PEBs in it's free PEB pool. Note that the log approach does
> >>    not have to be applied to all fastmap data (we just need it for the
> >>    PEB <-> LEB info).
> >>    Another off-topic note regarding the suggested log approach: we
> >>    could also use it to log which PEB was last written/erased, and use
> >>    that to handle the unstable bits issue.
> >> c/ (also suggested by Artem) delay VID write until we have enough data
> >>    to write on the LEB, and thus guarantee that it cannot be corrupted
> >>    (at least by programming on the paired page ;-)) anymore.
> >>    Doing that would also require logging data to be written on those
> >>    LEBs somewhere, not to mention the impact of copying the data twice
> >>    (once in the log, and then when we have enough data, in the real
> >>    block).
> >>
> >> I don't have any strong opinion about which solution is the best, also
> >> I'm maybe missing other aspects or better solutions, so feel free to
> >> comment on that and share your thoughts.
> > 
> > I decided to go for the simplest solution (but I can't promise I won't
> > change my mind if this approach appears to be wrong), which is either
> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of
> > each pair is used, which completely address the paired pages problem.
> > For now the SLC mode logic is hidden in the MTD/NAND layers which are
> > providing functions to write/read in SLC mode.
> > 
> > Thanks to this differentiation, UBI is now exposing two kind of LEBs:
> > - the secure (small) LEBS (those accessed in SLC mode)
> > - the unsecure (big) LEBS (those accessed in MLC mode)
> > 
> > The secure LEBs are marked as such with a flag in the VID header, which
> > allows tracking secure/unsecure LEBs and controlling the maximum size a
> > UBI user can read/write from/to a LEB.
> > This approach assume LEB 0 and 1 are never paired together (which
> 
> You mean page 0 and 1?

Yes.

> 
> > AFAICT is always true), because VID is stored on page 1 and we need the
> > secure_flag information to know how to access the LEB (SLC or MLC mode).
> > Of course I expose a few new helpers in the kernel API, and we'll
> > probably have to do it for the ioctl interface too if this approach is
> > validated.
> > 
> > That's all I got for the UBI layer.
> > Richard, Artem, any feedback so far?
> 
> Changing the on-flash format of UBI is a rather big thing.
> If it needs to be done I'm fine with it but we have to give our best
> to change it only once. :-)

Yes, I know that, and I don't pretend I chose the right solution ;-),
any other suggestions to avoid changing the on-flash format?

Note that I only added a new flag, and this flag is only set when you
map a LEB in SLC mode, which is not the default case, which in turn
means you'll be able to attach to an existing UBI partition. Of course
the reverse is not true, once you've started using the secure LEB
feature you can't attach this image with an UBI implementation that does
not support this feature.

> 
> >>
> >> That's all for the UBI layer. We will likely need new functions (and
> >> new fields in existing structures) to help UBI users deal with MLC
> >> NANDs: for example a field exposing the storage type or a function
> >> helping users skip one (or several) blocks to secure the data they have
> >> written so far. Anyway, those are things we can discuss after deciding
> >> which approach we want to take.
> >>
> >> Now, let's talk about the UBIFS layer. We are facing pretty much the
> >> same problem in there: we need to protect the data we have already
> >> written from time to time.
> >> AFAIU (correct me if I'm wrong), data should be secure when we sync the
> >> file system, or commit the UBIFS journal (feel free to correct me if
> >> I'm not using the right terms in my explanation).
> >> As explained earlier, the only way to secure data is to skip some pages
> >> (those that are paired with the already written ones).
> >>
> >> I see two approaches here (there might be more):
> >> 1/ do not skip any pages until we are asked to secure the data, and
> >>    then skip as much pages as needed to ensure nobody can ever corrupt
> >>    the data. With this approach you can loose a non negligible amount
> >>    of space. For example, with this paired pages scheme [1], if you
> >>    only write page on page 2 and want to secure your data, you'll have
> >>    to skip pages 3 to 8.
> >> 2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a
> >>    block). With this solution you always loose half the NAND capacity,
> >>    but in case of small writes, it's still more efficient than #1.
> >>    Of course using that solution is not acceptable, because you'll
> >>    only be able to use half the NAND capacity, but the plan is to use
> >>    it in conjunction with the GC, so that from time to time UBIFS
> >>    data chunks/nodes can be put in a single erase block without
> >>    skipping half the pages.
> >>    Note that currently the GC does not work this way: it tries to
> >>    collect chunks one by one and write them to the journal to free a
> >>    dirty LEB. What we would need here is a way to collect enough data
> >>    to fill an entire block and after that release the LEBs that where
> >>    previously using half the LEB capacity.
> >>
> >> Of course both of those solutions implies marking the skipped regions
> >> as dirty so that the GC can account for the padded space. For #1 we
> >> should probably also use padding nodes to reflect how much space is lost
> >> on the media, though I'm not sure how this can be done. For #2, we may
> >> have to differentiate 'full' and 'half' LEBs in the LPT.
> > 
> > If you followed my un/secure LEB approach described above, you probably
> > know that we don't have much solutions for the UBIFS layer.
> > 
> > My idea here is to use a garbage collection mechanism which will
> > consolidate data LEBs (LEBs containing valid data nodes).
> > By default all LEBs are used in secure (SLC) mode, which makes the
> > UBIFS layer reliable. From time to time the consolidation GC will
> > choose a few secure LEBs and move their nodes to an unsecure LEB.
> > The idea is to fill the entire unsecure LEB, so that we never write on
> > it afterwards, thus preventing any paired page corruption. Once this
> > copy is finished we can release/unmap the secure LEBs we have
> > consolidated (after adding a bud node to reference the unsecure LEB of
> > course).
> > 
> > Here are a few details about the implementation I started to develop
> > (questions will come after ;-)).
> > I added a new category (called LPROPS_FULL) to track the LEBs that are
> > almost full (lp->dirty + lp->free < leb_size / 4), so that we can
> > easily consolidate 2 to 3 full LEBs into a single unsecure LEB.
> > The consolidation is done by filling as much nodes as possible into an
> > unsecure LEB, and after a single pass, this should results in at least
> > one freed LEB freed: the consolidation moves nodes from at least 2
> > secure LEBs into a single one, so you're freeing 2 LEBs but need to
> > keep one for the next consolidation iteration, hence the single LEB
> > freed.
> > 
> > Now comes the questions to the UBIFS experts:
> > - should I create a new journal head to do what's described above?
> >   AFAICT I can't use the GC head, because the GC can still do it's job
> >   in parallel of the consolidation-GC, and the GC LEB might already be
> >   filled with some data nodes, right?
> >   I thought about using the data head, but again, it might already
> >   point to a partially filled data LEB.
> >   I added a journal head called BIG_DATA_HEAD, but I'm not sure this is
> >   acceptable, so let me know what you think about that?
> 
> I'd vote for a new head.
> If it turns out to be similar enough to another head we can still
> merge it to that head.

Yep, that's what I chose too. Actually, AFAIU, if we want the standard
and consolidation GC to work concurrently we need to add a new journal
head anyway.

> 
> > - when should we run the consolidation-GC? After the standard GC
> >   pass, when this one didn't make any progress, or should we launch
> >   it as soon as we have enough full LEBs to fill an unsecure LEB? The
> >   second solution might have a small impact on performances of an empty
> >   FS (below the half capacity size), but ITOH, it will scale better when
> >   the FS size exceed this limit (no need to run the GC each time we
> >   want to write new data).
> 
> I'd go for a hybrid approach.
> Run the consolidation-GC if standard GC was unable to produce free space
> and if more than X small LEBs are full.

That's probably the best solution indeed.

> 
> > - I still need to understand the races between TNC and GC, since I'm
> >   pretty sure I'll face the same kind of problems with the
> >   consolidation-GC. Can someone explains that to me, or should I dig
> >   further into the code :-)?
> 
> Not sure if I understand this questions correctly.
> 
> What you need for sure is i) a way to find out whether a LEB can be packed
> and ii) lock it while packing.

Hm, locking the whole TNC while we are consolidating several LEBs seems
a bit extreme (writing a whole unsecure LEB can take a non-negligible
amount of time). I think we can do this consolidation without taking
the TNC lock by first writing all the nodes on the new LEB without
updating the TNC, and once the unsecure LEB is filled update the TNC in
one go (that's what I'm trying to do here [1]).

> 
> > I'm pretty sure I forgot a lot of problematics here, also note that my
> > implementation is not finished yet, so this consolidation-GC concept
> > has not been validated. If you see anything that could defeat this
> > approach, please let me know so that I can adjust my development.
> 
> Please share your patches as soon as possible. Just mark them as RFC
> (really flaky code). I'll happily test them on my MLC boards and review them.

I can share the code (actually it's already on my github repo [2]), but
it's not even tested so don't expect to make it work on your board ;-).

Thanks for your first suggestions.

Best Regards,

Boris

[1]https://github.com/bbrezillon/linux-sunxi/blob/23cb262f1c73d24b2a52f41f91fb4c6c1305e8e7/fs/ubifs/gc.c#L739
[2]https://github.com/bbrezillon/linux-sunxi/tree/mlc-wip

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-28  9:24     ` Boris Brezillon
@ 2015-10-28 10:44       ` Michal Suchanek
  2015-10-28 11:14         ` Boris Brezillon
  0 siblings, 1 reply; 43+ messages in thread
From: Michal Suchanek @ 2015-10-28 10:44 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Richard Weinberger, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Artem Bityutskiy, Andrea Scian, MTD Maling List, Brian Norris,
	David Woodhouse, Bean Huo 霍斌斌 (beanhuo)

On 28 October 2015 at 10:24, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> Hi Richard,
>
> On Tue, 27 Oct 2015 21:16:28 +0100
> Richard Weinberger <richard@nod.at> wrote:
>
>> Boris,
>>
>> Am 23.10.2015 um 10:14 schrieb Boris Brezillon:
>> >> I'm currently working on the paired pages problem we have on MLC chips.
>> >> I remember discussing it with Artem earlier this year when I was
>> >> preparing my talk for ELC.
>> >>
>> >> I now have some time I can spend working on this problem and I started
>> >> looking at how this can be solved.
>> >>
>> >> First let's take a look at the UBI layer.
>> >> There's one basic thing we have to care about: protecting UBI metadata.
>> >> There are two kind of metadata:
>> >> 1/ those stored at the beginning of each erase block (EC and VID
>> >>    headers)
>> >> 2/ those stored in specific volumes (layout and fastmap volumes)
>> >>
>> >> We don't have to worry about #2 since those are written using atomic
>> >> update, and atomic updates are immune to this paired page corruption
>> >> problem (either the whole write is valid, or none of it is valid).
>> >>
>> >> This leaves problem #1.
>> >> For this case, Artem suggested to duplicate the EC header in the VID
>> >> header so that if page 0 is corrupted we can recover the EC info from
>> >> page 1 (which will contain both VID and EC info).
>> >> Doing that is fine for dealing with EC header corruption, since, AFAIK,
>> >> none of the NAND vendors are pairing page 0 with page 1.
>> >> Still remains the VID header corruption problem. Do prevent that we
>> >> still have several solutions:
>> >> a/ skip the page paired with the VID header. This is doable and can be
>> >>    hidden from UBI users, but it also means that we're loosing another
>> >>    page for metadata (not a negligible overhead)
>> >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
>> >>    seems the right place to put that in, since fastmap is already
>> >>    storing those information for almost all blocks. Still we would have
>> >>    to modify fastmap a bit to store information about all erase blocks
>> >>    and not only those that are not part of the fastmap pool.
>> >>    Also, updating that in real-time would require using a log approach,
>> >>    instead of the atomic update currently used by fastmap when it runs
>> >>    out of PEBs in it's free PEB pool. Note that the log approach does
>> >>    not have to be applied to all fastmap data (we just need it for the
>> >>    PEB <-> LEB info).
>> >>    Another off-topic note regarding the suggested log approach: we
>> >>    could also use it to log which PEB was last written/erased, and use
>> >>    that to handle the unstable bits issue.
>> >> c/ (also suggested by Artem) delay VID write until we have enough data
>> >>    to write on the LEB, and thus guarantee that it cannot be corrupted
>> >>    (at least by programming on the paired page ;-)) anymore.
>> >>    Doing that would also require logging data to be written on those
>> >>    LEBs somewhere, not to mention the impact of copying the data twice
>> >>    (once in the log, and then when we have enough data, in the real
>> >>    block).
>> >>
>> >> I don't have any strong opinion about which solution is the best, also
>> >> I'm maybe missing other aspects or better solutions, so feel free to
>> >> comment on that and share your thoughts.
>> >
>> > I decided to go for the simplest solution (but I can't promise I won't
>> > change my mind if this approach appears to be wrong), which is either
>> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of
>> > each pair is used, which completely address the paired pages problem.
>> > For now the SLC mode logic is hidden in the MTD/NAND layers which are
>> > providing functions to write/read in SLC mode.
>> >
>> > Thanks to this differentiation, UBI is now exposing two kind of LEBs:
>> > - the secure (small) LEBS (those accessed in SLC mode)
>> > - the unsecure (big) LEBS (those accessed in MLC mode)
>> >
>> > The secure LEBs are marked as such with a flag in the VID header, which
>> > allows tracking secure/unsecure LEBs and controlling the maximum size a
>> > UBI user can read/write from/to a LEB.
>> > This approach assume LEB 0 and 1 are never paired together (which
>>
>> You mean page 0 and 1?
>
> Yes.
>
>>
>> > AFAICT is always true), because VID is stored on page 1 and we need the
>> > secure_flag information to know how to access the LEB (SLC or MLC mode).
>> > Of course I expose a few new helpers in the kernel API, and we'll
>> > probably have to do it for the ioctl interface too if this approach is
>> > validated.
>> >
>> > That's all I got for the UBI layer.
>> > Richard, Artem, any feedback so far?
>>
>> Changing the on-flash format of UBI is a rather big thing.
>> If it needs to be done I'm fine with it but we have to give our best
>> to change it only once. :-)
>
> Yes, I know that, and I don't pretend I chose the right solution ;-),
> any other suggestions to avoid changing the on-flash format?
>
> Note that I only added a new flag, and this flag is only set when you
> map a LEB in SLC mode, which is not the default case, which in turn
> means you'll be able to attach to an existing UBI partition. Of course
> the reverse is not true, once you've started using the secure LEB
> feature you can't attach this image with an UBI implementation that does
> not support this feature.

Isn't a secure LEB just a plain LEB with half pages unused? Since you
only write secure LEBs normally and unsecure LEBs only in garbage
collector and you can tell secure LEB by the layout of used pages
there isn't really need for special marking AFAICFT

It might be a good idea to not allow mounting a flash which is
supposed to be protected against page corruption with a driver that
does not support that protection.

On the other hand, if backwards compatibility is desired and the
information can be stored without introducing a new flag it might be a
good idea to allow that as well.

Thanks

Michal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-28 10:44       ` Michal Suchanek
@ 2015-10-28 11:14         ` Boris Brezillon
  2015-10-28 15:50           ` Michal Suchanek
  0 siblings, 1 reply; 43+ messages in thread
From: Boris Brezillon @ 2015-10-28 11:14 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Richard Weinberger, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Artem Bityutskiy, Andrea Scian, MTD Maling List, Brian Norris,
	David Woodhouse, Bean Huo 霍斌斌 (beanhuo)

On Wed, 28 Oct 2015 11:44:49 +0100
Michal Suchanek <hramrach@gmail.com> wrote:

> On 28 October 2015 at 10:24, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
> > Hi Richard,
> >
> > On Tue, 27 Oct 2015 21:16:28 +0100
> > Richard Weinberger <richard@nod.at> wrote:
> >
> >> Boris,
> >>
> >> Am 23.10.2015 um 10:14 schrieb Boris Brezillon:
> >> >> I'm currently working on the paired pages problem we have on MLC chips.
> >> >> I remember discussing it with Artem earlier this year when I was
> >> >> preparing my talk for ELC.
> >> >>
> >> >> I now have some time I can spend working on this problem and I started
> >> >> looking at how this can be solved.
> >> >>
> >> >> First let's take a look at the UBI layer.
> >> >> There's one basic thing we have to care about: protecting UBI metadata.
> >> >> There are two kind of metadata:
> >> >> 1/ those stored at the beginning of each erase block (EC and VID
> >> >>    headers)
> >> >> 2/ those stored in specific volumes (layout and fastmap volumes)
> >> >>
> >> >> We don't have to worry about #2 since those are written using atomic
> >> >> update, and atomic updates are immune to this paired page corruption
> >> >> problem (either the whole write is valid, or none of it is valid).
> >> >>
> >> >> This leaves problem #1.
> >> >> For this case, Artem suggested to duplicate the EC header in the VID
> >> >> header so that if page 0 is corrupted we can recover the EC info from
> >> >> page 1 (which will contain both VID and EC info).
> >> >> Doing that is fine for dealing with EC header corruption, since, AFAIK,
> >> >> none of the NAND vendors are pairing page 0 with page 1.
> >> >> Still remains the VID header corruption problem. Do prevent that we
> >> >> still have several solutions:
> >> >> a/ skip the page paired with the VID header. This is doable and can be
> >> >>    hidden from UBI users, but it also means that we're loosing another
> >> >>    page for metadata (not a negligible overhead)
> >> >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
> >> >>    seems the right place to put that in, since fastmap is already
> >> >>    storing those information for almost all blocks. Still we would have
> >> >>    to modify fastmap a bit to store information about all erase blocks
> >> >>    and not only those that are not part of the fastmap pool.
> >> >>    Also, updating that in real-time would require using a log approach,
> >> >>    instead of the atomic update currently used by fastmap when it runs
> >> >>    out of PEBs in it's free PEB pool. Note that the log approach does
> >> >>    not have to be applied to all fastmap data (we just need it for the
> >> >>    PEB <-> LEB info).
> >> >>    Another off-topic note regarding the suggested log approach: we
> >> >>    could also use it to log which PEB was last written/erased, and use
> >> >>    that to handle the unstable bits issue.
> >> >> c/ (also suggested by Artem) delay VID write until we have enough data
> >> >>    to write on the LEB, and thus guarantee that it cannot be corrupted
> >> >>    (at least by programming on the paired page ;-)) anymore.
> >> >>    Doing that would also require logging data to be written on those
> >> >>    LEBs somewhere, not to mention the impact of copying the data twice
> >> >>    (once in the log, and then when we have enough data, in the real
> >> >>    block).
> >> >>
> >> >> I don't have any strong opinion about which solution is the best, also
> >> >> I'm maybe missing other aspects or better solutions, so feel free to
> >> >> comment on that and share your thoughts.
> >> >
> >> > I decided to go for the simplest solution (but I can't promise I won't
> >> > change my mind if this approach appears to be wrong), which is either
> >> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of
> >> > each pair is used, which completely address the paired pages problem.
> >> > For now the SLC mode logic is hidden in the MTD/NAND layers which are
> >> > providing functions to write/read in SLC mode.
> >> >
> >> > Thanks to this differentiation, UBI is now exposing two kind of LEBs:
> >> > - the secure (small) LEBS (those accessed in SLC mode)
> >> > - the unsecure (big) LEBS (those accessed in MLC mode)
> >> >
> >> > The secure LEBs are marked as such with a flag in the VID header, which
> >> > allows tracking secure/unsecure LEBs and controlling the maximum size a
> >> > UBI user can read/write from/to a LEB.
> >> > This approach assume LEB 0 and 1 are never paired together (which
> >>
> >> You mean page 0 and 1?
> >
> > Yes.
> >
> >>
> >> > AFAICT is always true), because VID is stored on page 1 and we need the
> >> > secure_flag information to know how to access the LEB (SLC or MLC mode).
> >> > Of course I expose a few new helpers in the kernel API, and we'll
> >> > probably have to do it for the ioctl interface too if this approach is
> >> > validated.
> >> >
> >> > That's all I got for the UBI layer.
> >> > Richard, Artem, any feedback so far?
> >>
> >> Changing the on-flash format of UBI is a rather big thing.
> >> If it needs to be done I'm fine with it but we have to give our best
> >> to change it only once. :-)
> >
> > Yes, I know that, and I don't pretend I chose the right solution ;-),
> > any other suggestions to avoid changing the on-flash format?
> >
> > Note that I only added a new flag, and this flag is only set when you
> > map a LEB in SLC mode, which is not the default case, which in turn
> > means you'll be able to attach to an existing UBI partition. Of course
> > the reverse is not true, once you've started using the secure LEB
> > feature you can't attach this image with an UBI implementation that does
> > not support this feature.
> 
> Isn't a secure LEB just a plain LEB with half pages unused? Since you
> only write secure LEBs normally and unsecure LEBs only in garbage
> collector and you can tell secure LEB by the layout of used pages
> there isn't really need for special marking AFAICFT

This implies scanning several pages per block to determine which type
of LEB is in use, which will drastically increase the attach time.
The whole point of this flag is to avoid scanning anything else but the
EC and VID headers (or the fastmap LEBs if fastmap is in use).

> 
> It might be a good idea to not allow mounting a flash which is
> supposed to be protected against page corruption with a driver that
> does not support that protection.

That can be done by incrementing the UBI_VERSION value...

> 
> On the other hand, if backwards compatibility is desired and the
> information can be stored without introducing a new flag it might be a
> good idea to allow that as well.

... but I agree that we should avoid breaking the backward compatibility
if that's possible.



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon
                   ` (2 preceding siblings ...)
  2015-10-23  8:14 ` Boris Brezillon
@ 2015-10-28 12:06 ` Artem Bityutskiy
  3 siblings, 0 replies; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-28 12:06 UTC (permalink / raw)
  To: Boris Brezillon, Richard Weinberger
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Qi Wang 王起 (qiwang), Iwo Mergler,
	Jeff Lauruhn (jlauruhn)

On Thu, 2015-09-17 at 15:22 +0200, Boris Brezillon wrote:
> 1/ do not skip any pages until we are asked to secure the data, and
>    then skip as much pages as needed to ensure nobody can ever
> corrupt
>    the data. With this approach you can loose a non negligible amount
>    of space. For example, with this paired pages scheme [1], if you
>    only write page on page 2 and want to secure your data, you'll
> have
>    to skip pages 3 to 8.

This sounds like a the right way to go to me.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-23  8:14 ` Boris Brezillon
  2015-10-27 20:16   ` Richard Weinberger
@ 2015-10-28 12:24   ` Artem Bityutskiy
  2015-10-30  8:15     ` Boris Brezillon
  1 sibling, 1 reply; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-28 12:24 UTC (permalink / raw)
  To: Boris Brezillon, Richard Weinberger
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 (beanhuo)

On Fri, 2015-10-23 at 10:14 +0200, Boris Brezillon wrote:
> I decided to go for the simplest solution (but I can't promise I
> won't
> change my mind if this approach appears to be wrong), which is either
> using a LEB is MLC or SLC mode. In SLC modes, only the first page of
> each pair is used, which completely address the paired pages problem.
> For now the SLC mode logic is hidden in the MTD/NAND layers which are
> providing functions to write/read in SLC mode.

Most of the writes go through the journalling subsystem.

There are some non-journal writes, related to internal meta-date
management, like from other subsystems: log, the master node, LPT,
index, GC.

In case of journal subsystem, in MLC mode you just skip pages every
time the "flush write-buffer" API call is used.

In LPT subsystem, you invent a custom solution, skip pages as needed.

In master - probably nothing needs to be done, since we have 2 copies.

Index, GC - data also goes via journal, so the journal subsystem
solution will probably cover it.


> Thanks to this differentiation, UBI is now exposing two kind of LEBs:
> - the secure (small) LEBS (those accessed in SLC mode)
> - the unsecure (big) LEBS (those accessed in MLC mode)

Is this really necessary? Feels like a bit of over-complication to the
UBI layer.

Can UBI care about itself WRT MLC safeness, and let UBIFS care about
itself?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-28 11:14         ` Boris Brezillon
@ 2015-10-28 15:50           ` Michal Suchanek
  0 siblings, 0 replies; 43+ messages in thread
From: Michal Suchanek @ 2015-10-28 15:50 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Richard Weinberger, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Artem Bityutskiy, Andrea Scian, MTD Maling List, Brian Norris,
	David Woodhouse, Bean Huo 霍斌斌 (beanhuo)

On 28 October 2015 at 12:14, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> On Wed, 28 Oct 2015 11:44:49 +0100
> Michal Suchanek <hramrach@gmail.com> wrote:
>
>> On 28 October 2015 at 10:24, Boris Brezillon
>> <boris.brezillon@free-electrons.com> wrote:
>> > Hi Richard,
>> >
>> > On Tue, 27 Oct 2015 21:16:28 +0100
>> > Richard Weinberger <richard@nod.at> wrote:
>> >
>> >> Boris,
>> >>
>> >> Am 23.10.2015 um 10:14 schrieb Boris Brezillon:
>> >> >> I'm currently working on the paired pages problem we have on MLC chips.
>> >> >> I remember discussing it with Artem earlier this year when I was
>> >> >> preparing my talk for ELC.
>> >> >>
>> >> >> I now have some time I can spend working on this problem and I started
>> >> >> looking at how this can be solved.
>> >> >>
>> >> >> First let's take a look at the UBI layer.
>> >> >> There's one basic thing we have to care about: protecting UBI metadata.
>> >> >> There are two kind of metadata:
>> >> >> 1/ those stored at the beginning of each erase block (EC and VID
>> >> >>    headers)
>> >> >> 2/ those stored in specific volumes (layout and fastmap volumes)
>> >> >>
>> >> >> We don't have to worry about #2 since those are written using atomic
>> >> >> update, and atomic updates are immune to this paired page corruption
>> >> >> problem (either the whole write is valid, or none of it is valid).
>> >> >>
>> >> >> This leaves problem #1.
>> >> >> For this case, Artem suggested to duplicate the EC header in the VID
>> >> >> header so that if page 0 is corrupted we can recover the EC info from
>> >> >> page 1 (which will contain both VID and EC info).
>> >> >> Doing that is fine for dealing with EC header corruption, since, AFAIK,
>> >> >> none of the NAND vendors are pairing page 0 with page 1.
>> >> >> Still remains the VID header corruption problem. Do prevent that we
>> >> >> still have several solutions:
>> >> >> a/ skip the page paired with the VID header. This is doable and can be
>> >> >>    hidden from UBI users, but it also means that we're loosing another
>> >> >>    page for metadata (not a negligible overhead)
>> >> >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
>> >> >>    seems the right place to put that in, since fastmap is already
>> >> >>    storing those information for almost all blocks. Still we would have
>> >> >>    to modify fastmap a bit to store information about all erase blocks
>> >> >>    and not only those that are not part of the fastmap pool.
>> >> >>    Also, updating that in real-time would require using a log approach,
>> >> >>    instead of the atomic update currently used by fastmap when it runs
>> >> >>    out of PEBs in it's free PEB pool. Note that the log approach does
>> >> >>    not have to be applied to all fastmap data (we just need it for the
>> >> >>    PEB <-> LEB info).
>> >> >>    Another off-topic note regarding the suggested log approach: we
>> >> >>    could also use it to log which PEB was last written/erased, and use
>> >> >>    that to handle the unstable bits issue.
>> >> >> c/ (also suggested by Artem) delay VID write until we have enough data
>> >> >>    to write on the LEB, and thus guarantee that it cannot be corrupted
>> >> >>    (at least by programming on the paired page ;-)) anymore.
>> >> >>    Doing that would also require logging data to be written on those
>> >> >>    LEBs somewhere, not to mention the impact of copying the data twice
>> >> >>    (once in the log, and then when we have enough data, in the real
>> >> >>    block).
>> >> >>
>> >> >> I don't have any strong opinion about which solution is the best, also
>> >> >> I'm maybe missing other aspects or better solutions, so feel free to
>> >> >> comment on that and share your thoughts.
>> >> >
>> >> > I decided to go for the simplest solution (but I can't promise I won't
>> >> > change my mind if this approach appears to be wrong), which is either
>> >> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of
>> >> > each pair is used, which completely address the paired pages problem.
>> >> > For now the SLC mode logic is hidden in the MTD/NAND layers which are
>> >> > providing functions to write/read in SLC mode.
>> >> >
>> >> > Thanks to this differentiation, UBI is now exposing two kind of LEBs:
>> >> > - the secure (small) LEBS (those accessed in SLC mode)
>> >> > - the unsecure (big) LEBS (those accessed in MLC mode)
>> >> >
>> >> > The secure LEBs are marked as such with a flag in the VID header, which
>> >> > allows tracking secure/unsecure LEBs and controlling the maximum size a
>> >> > UBI user can read/write from/to a LEB.
>> >> > This approach assume LEB 0 and 1 are never paired together (which
>> >>
>> >> You mean page 0 and 1?
>> >
>> > Yes.
>> >
>> >>
>> >> > AFAICT is always true), because VID is stored on page 1 and we need the
>> >> > secure_flag information to know how to access the LEB (SLC or MLC mode).
>> >> > Of course I expose a few new helpers in the kernel API, and we'll
>> >> > probably have to do it for the ioctl interface too if this approach is
>> >> > validated.
>> >> >
>> >> > That's all I got for the UBI layer.
>> >> > Richard, Artem, any feedback so far?
>> >>
>> >> Changing the on-flash format of UBI is a rather big thing.
>> >> If it needs to be done I'm fine with it but we have to give our best
>> >> to change it only once. :-)
>> >
>> > Yes, I know that, and I don't pretend I chose the right solution ;-),
>> > any other suggestions to avoid changing the on-flash format?
>> >
>> > Note that I only added a new flag, and this flag is only set when you
>> > map a LEB in SLC mode, which is not the default case, which in turn
>> > means you'll be able to attach to an existing UBI partition. Of course
>> > the reverse is not true, once you've started using the secure LEB
>> > feature you can't attach this image with an UBI implementation that does
>> > not support this feature.
>>
>> Isn't a secure LEB just a plain LEB with half pages unused? Since you
>> only write secure LEBs normally and unsecure LEBs only in garbage
>> collector and you can tell secure LEB by the layout of used pages
>> there isn't really need for special marking AFAICFT
>
> This implies scanning several pages per block to determine which type
> of LEB is in use, which will drastically increase the attach time.
> The whole point of this flag is to avoid scanning anything else but the
> EC and VID headers (or the fastmap LEBs if fastmap is in use).

Why do you need to scan anything more than you would normally?

You assume that any blocks already written are written correctly and
you will write any new blocks securely and perform garbage collection
to condense blocks which have unused pages that cannot be written
securely or were already used and have stale data. The current data
format supposedly already allows determining which pages are in use
and which not without extra scanning so there is no more work to be
done. The only issue I see is that blocks which are writable by
current driver cannot be written securely in some cases. You have to
deal with that while upgrading a filesystem from old format anyway so
not changing the format will just use the upgrade code path more.

Thanks

Michal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-28 12:24   ` Artem Bityutskiy
@ 2015-10-30  8:15     ` Boris Brezillon
  2015-10-30  8:21       ` Boris Brezillon
                         ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-10-30  8:15 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 "(beanhuo)"

Hi Artem,

Don't take the following answer as a try to teach you how UBI/UBIFS work
or should work with MLC NANDs. I still listen to your suggestions, but
when I had a look at how this "skip pages on demand" approach could
be implemented I realized it was not so simple.

Also, if you don't mind I'd like to finish my consolidation-GC
implementation before trying a new approach, which don't mean I won't
consider the "skip pages on demand" one.

On Wed, 28 Oct 2015 14:24:45 +0200
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Fri, 2015-10-23 at 10:14 +0200, Boris Brezillon wrote:
> > I decided to go for the simplest solution (but I can't promise I
> > won't
> > change my mind if this approach appears to be wrong), which is either
> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of
> > each pair is used, which completely address the paired pages problem.
> > For now the SLC mode logic is hidden in the MTD/NAND layers which are
> > providing functions to write/read in SLC mode.
> 
> Most of the writes go through the journalling subsystem.
> 
> There are some non-journal writes, related to internal meta-date
> management, like from other subsystems: log, the master node, LPT,
> index, GC.
> 
> In case of journal subsystem, in MLC mode you just skip pages every
> time the "flush write-buffer" API call is used.
> 
> In LPT subsystem, you invent a custom solution, skip pages as needed.
> 
> In master - probably nothing needs to be done, since we have 2 copies.
> 
> Index, GC - data also goes via journal, so the journal subsystem
> solution will probably cover it.

For the general concept I agree that it should probably work, but here
are my concerns (maybe you'll prove me wrong ;-)):

1/ will you ever be able to use a full LEB without skipping any pages?
I mean, when use the "skip pages on demand" you can easily have more
than half the page in your LEB skipped, because when you write only on
one page, you'll have to skip between 3 to 8 pages (it depends on the
pairing scheme). I'll try to run gather some statistics to see how
often wbuf are synced to see if that's a real problem.
The consolidation approach has the advantage of being able to
consolidate existing LEBs to completely fill them, but the consolidation
stuff could probably work with the "skip pages on demand".

2/ skipping pages on demand is not as easy as only writing on lower
pages of each pair. As you might know, when skipping pages to secure
your data, you'll also have to skip some lower pages so that you end up
with an offset to a memory region that can be contiguously written to,
and when you skip those lower pages, you have to write on it, because
NAND chips require that the lower page of each pair be programmed
before the higher one (ignoring this will just render some pages
unreliable).

3/ UBIFS is really picky when it comes to corrupted nodes detection,
and there are a few cases where it refuses to mount the FS when a
corrupted node is detected. One of this case is when the corrupted
page (filled with one or several nodes) is filled with non-ff data,
which is likely to happen with MLC NANDs (paired pages are not
contiguous). We discussed about relaxing this policy a few weeks ago,
but what should we do when such a corruption is detected? Drop all
nodes with a sequence higher or equal to the last valid node on the
LEB?
Note that with the consolidation-GC approach we don't have this
problem because the consolidate LEB is added to journal after it has
been completely filled with data, and marked as full (->free = 0) so
that nobody can reclaim it to write data on it.

> 
> 
> > Thanks to this differentiation, UBI is now exposing two kind of LEBs:
> > - the secure (small) LEBS (those accessed in SLC mode)
> > - the unsecure (big) LEBS (those accessed in MLC mode)
> 
> Is this really necessary? Feels like a bit of over-complication to the
> UBI layer.

Hm, it's actually not so complicated: SLC mode is implemented by the
NAND layer and UBI is just using MTD functions to access the NAND in
SLC mode. I'm more concerned by the on-flash format changes problem
raised by Richard.

> 
> Can UBI care about itself WRT MLC safeness, and let UBIFS care about
> itself?
> 

Sorry but I don't agree here. By exposing the secure LEB concept, UBI
does not specifically care about UBIFS, it just provides a way for all
UBI users to address the problem brought by paired pages in a generic
way.
Maybe the secure LEB approach is wrong, but in the end UBI will expose
other functions to handle those paired pages problems
(ubi_secure_data() to skip pages for example), and this layering
(NAND/MTD/UBI/UBIFS) is IMO the only sane way to let each layer handle
what it's supposed to handle and let the upper layers use the new
features to mitigate the problems.
So, no matter which solution is chosen, it will impact the UBI, MTD,
and NAND layers.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30  8:15     ` Boris Brezillon
@ 2015-10-30  8:21       ` Boris Brezillon
  2015-10-30  8:50       ` Bean Huo 霍斌斌 (beanhuo)
  2015-10-30  9:08       ` Artem Bityutskiy
  2 siblings, 0 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-10-30  8:21 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 "(beanhuo)"

On Fri, 30 Oct 2015 09:15:21 +0100
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> 
> 2/ skipping pages on demand is not as easy as only writing on lower
> pages of each pair. As you might know, when skipping pages to secure
> your data, you'll also have to skip some lower pages so that you end up
> with an offset to a memory region that can be contiguously written to,
> and when you skip those lower pages, you have to write on it, because
> NAND chips require that the lower page of each pair be programmed
> before the higher one (ignoring this will just render some pages
> unreliable).
> 
> 3/ UBIFS is really picky when it comes to corrupted nodes detection,
> and there are a few cases where it refuses to mount the FS when a
> corrupted node is detected. One of this case is when the corrupted
> page (filled with one or several nodes) is filled with non-ff data,

I meant, "Once of this case is when the corrupted page is followed by
a page filled with non-ff data" 

> which is likely to happen with MLC NANDs (paired pages are not
> contiguous). We discussed about relaxing this policy a few weeks ago,
> but what should we do when such a corruption is detected? Drop all
> nodes with a sequence higher or equal to the last valid node on the
> LEB?
> Note that with the consolidation-GC approach we don't have this
> problem because the consolidate LEB is added to journal after it has
> been completely filled with data, and marked as full (->free = 0) so
> that nobody can reclaim it to write data on it.

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30  8:15     ` Boris Brezillon
  2015-10-30  8:21       ` Boris Brezillon
@ 2015-10-30  8:50       ` Bean Huo 霍斌斌 (beanhuo)
  2015-10-30  9:08       ` Artem Bityutskiy
  2 siblings, 0 replies; 43+ messages in thread
From: Bean Huo 霍斌斌 (beanhuo) @ 2015-10-30  8:50 UTC (permalink / raw)
  To: Boris Brezillon, Artem Bityutskiy
  Cc: Richard Weinberger, linux-mtd@lists.infradead.org,
	David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler,
	Jeff Lauruhn (jlauruhn)


> Hi Artem,
> 
> Don't take the following answer as a try to teach you how UBI/UBIFS work or
> should work with MLC NANDs. I still listen to your suggestions, but when I had
> a look at how this "skip pages on demand" approach could be implemented I
> realized it was not so simple.
> 
> Also, if you don't mind I'd like to finish my consolidation-GC implementation
> before trying a new approach, which don't mean I won't consider the "skip
> pages on demand" one.

"Skip page" is really a tough solution to code in UBI. Generally, I know that skip page implement
in FTL layer to solve MLC paired page issue, that is easy to code.
Is it necessary to implement in Linux? I don't know.

> On Wed, 28 Oct 2015 14:24:45 +0200
> Artem Bityutskiy <dedekind1@gmail.com> wrote:
> 
> > On Fri, 2015-10-23 at 10:14 +0200, Boris Brezillon wrote:
> > > I decided to go for the simplest solution (but I can't promise I
> > > won't change my mind if this approach appears to be wrong), which is
> > > either using a LEB is MLC or SLC mode. In SLC modes, only the first
> > > page of each pair is used, which completely address the paired pages
> > > problem.
> > > For now the SLC mode logic is hidden in the MTD/NAND layers which
> > > are providing functions to write/read in SLC mode.
> >
> > Most of the writes go through the journalling subsystem.
> >
> > There are some non-journal writes, related to internal meta-date
> > management, like from other subsystems: log, the master node, LPT,
> > index, GC.
> >
> > In case of journal subsystem, in MLC mode you just skip pages every
> > time the "flush write-buffer" API call is used.
> >
> > In LPT subsystem, you invent a custom solution, skip pages as needed.
> >
> > In master - probably nothing needs to be done, since we have 2 copies.
> >
> > Index, GC - data also goes via journal, so the journal subsystem
> > solution will probably cover it.
> 
> For the general concept I agree that it should probably work, but here are my
> concerns (maybe you'll prove me wrong ;-)):
> 
> 1/ will you ever be able to use a full LEB without skipping any pages?
> I mean, when use the "skip pages on demand" you can easily have more than
> half the page in your LEB skipped, because when you write only on one page,
> you'll have to skip between 3 to 8 pages (it depends on the pairing scheme). I'll
> try to run gather some statistics to see how often wbuf are synced to see if
> that's a real problem.
> The consolidation approach has the advantage of being able to consolidate
> existing LEBs to completely fill them, but the consolidation stuff could probably
> work with the "skip pages on demand".
> 
> 2/ skipping pages on demand is not as easy as only writing on lower pages of
> each pair. As you might know, when skipping pages to secure your data, you'll
> also have to skip some lower pages so that you end up with an offset to a
> memory region that can be contiguously written to, and when you skip those
> lower pages, you have to write on it, because NAND chips require that the
> lower page of each pair be programmed before the higher one (ignoring this
> will just render some pages unreliable).
> 
> 3/ UBIFS is really picky when it comes to corrupted nodes detection, and there
> are a few cases where it refuses to mount the FS when a corrupted node is
> detected. One of this case is when the corrupted page (filled with one or
> several nodes) is filled with non-ff data, which is likely to happen with MLC
> NANDs (paired pages are not contiguous). We discussed about relaxing this
> policy a few weeks ago, but what should we do when such a corruption is
> detected? Drop all nodes with a sequence higher or equal to the last valid
> node on the LEB?
> Note that with the consolidation-GC approach we don't have this problem
> because the consolidate LEB is added to journal after it has been completely
> filled with data, and marked as full (->free = 0) so that nobody can reclaim it
> to write data on it.
> 
> >
> >
> > > Thanks to this differentiation, UBI is now exposing two kind of LEBs:
> > > - the secure (small) LEBS (those accessed in SLC mode)
> > > - the unsecure (big) LEBS (those accessed in MLC mode)
> >
> > Is this really necessary? Feels like a bit of over-complication to the
> > UBI layer.
> 
> Hm, it's actually not so complicated: SLC mode is implemented by the NAND
> layer and UBI is just using MTD functions to access the NAND in SLC mode. I'm
> more concerned by the on-flash format changes problem raised by Richard.
> 
> >
> > Can UBI care about itself WRT MLC safeness, and let UBIFS care about
> > itself?
> >
> 
> Sorry but I don't agree here. By exposing the secure LEB concept, UBI does not
> specifically care about UBIFS, it just provides a way for all UBI users to address
> the problem brought by paired pages in a generic way.
> Maybe the secure LEB approach is wrong, but in the end UBI will expose other
> functions to handle those paired pages problems
> (ubi_secure_data() to skip pages for example), and this layering
> (NAND/MTD/UBI/UBIFS) is IMO the only sane way to let each layer handle
> what it's supposed to handle and let the upper layers use the new features to
> mitigate the problems.
> So, no matter which solution is chosen, it will impact the UBI, MTD, and NAND
> layers.
> 
> Best Regards,
> 
> Boris
> 
> --
> Boris Brezillon, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30  8:15     ` Boris Brezillon
  2015-10-30  8:21       ` Boris Brezillon
  2015-10-30  8:50       ` Bean Huo 霍斌斌 (beanhuo)
@ 2015-10-30  9:08       ` Artem Bityutskiy
  2015-10-30  9:45         ` Boris Brezillon
  2 siblings, 1 reply; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-30  9:08 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 "(beanhuo)"

On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote:
> Hi Artem,
> 
> Don't take the following answer as a try to teach you how UBI/UBIFS
> work
> or should work with MLC NANDs. I still listen to your suggestions,
> but
> when I had a look at how this "skip pages on demand" approach could
> be implemented I realized it was not so simple.

Sure.

Could you verify my understanding please.

You realized that "skip on demand" is not easy, and you suggest that we
simply write all the data twice - first time we skip pages, and then we
garbage collect everything. At the end, roughly speaking, we trade off
half of the IO speed, power, and NAND lifetime.

About secure LEBs - do you suggest UBI exposes 2 different LEB sizes at
the same time - secure and unsecure, or you it could be only in one of
the modes.

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30  9:08       ` Artem Bityutskiy
@ 2015-10-30  9:45         ` Boris Brezillon
  2015-10-30 10:09           ` Artem Bityutskiy
  2015-10-30 11:43           ` Artem Bityutskiy
  0 siblings, 2 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-10-30  9:45 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 ""(beanhuo)""

On Fri, 30 Oct 2015 11:08:10 +0200
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote:
> > Hi Artem,
> > 
> > Don't take the following answer as a try to teach you how UBI/UBIFS
> > work
> > or should work with MLC NANDs. I still listen to your suggestions,
> > but
> > when I had a look at how this "skip pages on demand" approach could
> > be implemented I realized it was not so simple.
> 
> Sure.
> 
> Could you verify my understanding please.
> 
> You realized that "skip on demand" is not easy, and you suggest that we
> simply write all the data twice - first time we skip pages, and then we
> garbage collect everything. At the end, roughly speaking, we trade off
> half of the IO speed, power, and NAND lifetime.

That will be pretty much the same with the "skip on demand" approach,
because you'll probably loose a lot of space when syncing the wbuf.
Remember that you have to skip between 3 and 8 pages, so if your
buffers are regularly synced (either manually of by the timer), you'll
increase the dirty space in those LEBs, and in the end you'll just rely
on the regular GC to collect those partially written LEB. Except that
the regular GC will in turn loose some space when syncing its wbuf.

Moreover, the standard GC only takes place when you can't find a free
LEB anymore, which will probably happen when you reach something close
to half the partition size in case of MLC chips (it may be a bit
higher if you managed to occupy more than half of each LEB capacity).
This means that your FS will become slower when you reach this limit,
though maybe this can be addressed by triggering the GC before we run
out of free LEBs.

> 
> About secure LEBs - do you suggest UBI exposes 2 different LEB sizes at
> the same time - secure and unsecure, or you it could be only in one of
> the modes.

A given LEB can only be in secure or unsecure mode, but a UBI volume
can expose both unsecure and secure LEBs, and those LEBs have different
sizes.
The secure/unsecure mode is chosen when mapping the LEB, and the LEB
stays in this mode until it's unmapped.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30  9:45         ` Boris Brezillon
@ 2015-10-30 10:09           ` Artem Bityutskiy
  2015-10-30 11:49             ` Michal Suchanek
  2015-10-30 11:43           ` Artem Bityutskiy
  1 sibling, 1 reply; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-30 10:09 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 ""(beanhuo)""

On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote:
> On Fri, 30 Oct 2015 11:08:10 +0200
> Artem Bityutskiy <dedekind1@gmail.com> wrote:
> 
> > On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote:
> > > Hi Artem,
> > > 
> > > Don't take the following answer as a try to teach you how
> > > UBI/UBIFS
> > > work
> > > or should work with MLC NANDs. I still listen to your
> > > suggestions,
> > > but
> > > when I had a look at how this "skip pages on demand" approach
> > > could
> > > be implemented I realized it was not so simple.
> > 
> > Sure.
> > 
> > Could you verify my understanding please.
> > 
> > You realized that "skip on demand" is not easy, and you suggest
> > that we
> > simply write all the data twice - first time we skip pages, and
> > then we
> > garbage collect everything. At the end, roughly speaking, we trade
> > off
> > half of the IO speed, power, and NAND lifetime.

So I guess the answer is generally "yes", right? I just want to be
clear about the trade-off.

> That will be pretty much the same with the "skip on demand" approach,
> because you'll probably loose a lot of space when syncing the wbuf.

Write buffer is designed to optimized space usage. Instead of wasting
the rest of the NAND page, we wait for more data to arrive and put it
to the same NAND page with the previous piece of data.

This suggests that we do not sync it too often, or at least that the
efforts were taken not to do this.

Off the top of my head, we sync the write-buffer (AKA wbuf) in these
cases:
1. Journal commit, which happens once in a while, depends on journal
size.
2. User-initiated sync, like fsync(), sync(), remount, etc.
3. Write-buffer timer, which fires when there were no writes withing
certain interval, like 5 seconds. The time can be tuned.
4. Other situations like the end of GC, etc - these are related to meta
-data management.

Now, imagine you writing a lot of data, like uncompressing a big
tarball, or compressing, or just backing up your /home. In this
situation you have a continuous flow of data from VFS to UBIFS.

UBIFS will keep writing the data to the journal, and there won't be any
wbuf syncs. The syncs will happen only on journal commit. So you end up
with LEBs full of data and not requiring any GC.

But yes, if we are talking about, say, an idle system, which
occasionally writes something, there will be a wbuf sync after every
write.

So in the "I need all your capacity" kind of situations where IO speed
matters, and there are a lot of data written - we'd be optimal, no
double writes.

In the "I am mostly idle" type of situations we'll do double writes.

SIGLUNCH, colleagues waiting, sorry, I guess I wrote enough :-)

> A given LEB can only be in secure or unsecure mode, but a UBI volume
> can expose both unsecure and secure LEBs, and those LEBs have
> different
> sizes.
> The secure/unsecure mode is chosen when mapping the LEB, and the LEB
> stays in this mode until it's unmapped.

This is not going to be a little value add to UBI, this is going to be 
a big change in my opinion. If UBIFS ends up using this - may worth the
effort. Otherwise, I'd argue that this would need an important customer
to be worth the effort.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30  9:45         ` Boris Brezillon
  2015-10-30 10:09           ` Artem Bityutskiy
@ 2015-10-30 11:43           ` Artem Bityutskiy
  2015-10-30 11:59             ` Richard Weinberger
  2015-10-30 12:30             ` Boris Brezillon
  1 sibling, 2 replies; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-30 11:43 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 ""(beanhuo)""

On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote:
> Moreover, the standard GC only takes place when you can't find a free
> LEB anymore, which will probably happen when you reach something
> close
> to half the partition size in case of MLC chips (it may be a bit
> higher if you managed to occupy more than half of each LEB capacity).
> This means that your FS will become slower when you reach this limit,
> though maybe this can be addressed by triggering the GC before we run
> out of free LEBs.

Right. I'd call it a detail. But the big picture is - if you have to GC
all the data you write, you write twice. When exactly you do the second
write is a detail - sometimes it is deferred, it is in background etc,
sometimes right away - you have to GC older data before being able to
write new data.

Now, by no means I am criticizing you or your decisions, you are doing
great job. I am more like summarizing and trying to give you some food
for thoughts. :-)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30 10:09           ` Artem Bityutskiy
@ 2015-10-30 11:49             ` Michal Suchanek
  2015-10-30 12:47               ` Artem Bityutskiy
  0 siblings, 1 reply; 43+ messages in thread
From: Michal Suchanek @ 2015-10-30 11:49 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Boris Brezillon, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Richard Weinberger, Andrea Scian, MTD Maling List, Brian Norris,
	David Woodhouse, Bean Huo 霍斌斌 (beanhuo)

On 30 October 2015 at 11:09, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote:
>> On Fri, 30 Oct 2015 11:08:10 +0200
>> Artem Bityutskiy <dedekind1@gmail.com> wrote:
>>
>> > On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote:
>> > > Hi Artem,
>> > >
>> > > Don't take the following answer as a try to teach you how
>> > > UBI/UBIFS
>> > > work
>> > > or should work with MLC NANDs. I still listen to your
>> > > suggestions,
>> > > but
>> > > when I had a look at how this "skip pages on demand" approach
>> > > could
>> > > be implemented I realized it was not so simple.
>> >
>> > Sure.
>> >
>> > Could you verify my understanding please.
>> >
>> > You realized that "skip on demand" is not easy, and you suggest
>> > that we
>> > simply write all the data twice - first time we skip pages, and
>> > then we
>> > garbage collect everything. At the end, roughly speaking, we trade
>> > off
>> > half of the IO speed, power, and NAND lifetime.
>
> So I guess the answer is generally "yes", right? I just want to be
> clear about the trade-off.
>
>> That will be pretty much the same with the "skip on demand" approach,
>> because you'll probably loose a lot of space when syncing the wbuf.
>
> Write buffer is designed to optimized space usage. Instead of wasting
> the rest of the NAND page, we wait for more data to arrive and put it
> to the same NAND page with the previous piece of data.
>
> This suggests that we do not sync it too often, or at least that the
> efforts were taken not to do this.
>
> Off the top of my head, we sync the write-buffer (AKA wbuf) in these
> cases:
> 1. Journal commit, which happens once in a while, depends on journal
> size.
> 2. User-initiated sync, like fsync(), sync(), remount, etc.
> 3. Write-buffer timer, which fires when there were no writes withing
> certain interval, like 5 seconds. The time can be tuned.
> 4. Other situations like the end of GC, etc - these are related to meta
> -data management.
>
> Now, imagine you writing a lot of data, like uncompressing a big
> tarball, or compressing, or just backing up your /home. In this
> situation you have a continuous flow of data from VFS to UBIFS.
>
> UBIFS will keep writing the data to the journal, and there won't be any
> wbuf syncs. The syncs will happen only on journal commit. So you end up
> with LEBs full of data and not requiring any GC.

Actually, since there is no guarantee that the data ever gets written
(or error reported in case it cannot) unless you fsync() every file
before close() any sane uncompressor, backup program, etc. will
fsync() every file written regardless of its size. So if your home has
a lot of configuration files and sources this will not be nice
continuous stream of data in many cases.

IIRC scp(1) does not fsync() potentially resulting in large amounts of
data getting silently lost. For that reason and others it should be
renamed to iscp.

Thanks

Michal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30 11:43           ` Artem Bityutskiy
@ 2015-10-30 11:59             ` Richard Weinberger
  2015-10-30 12:29               ` Artem Bityutskiy
  2015-10-30 12:30             ` Boris Brezillon
  1 sibling, 1 reply; 43+ messages in thread
From: Richard Weinberger @ 2015-10-30 11:59 UTC (permalink / raw)
  To: dedekind1, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 ""(beanhuo)""

Am 30.10.2015 um 12:43 schrieb Artem Bityutskiy:
> On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote:
>> Moreover, the standard GC only takes place when you can't find a free
>> LEB anymore, which will probably happen when you reach something
>> close
>> to half the partition size in case of MLC chips (it may be a bit
>> higher if you managed to occupy more than half of each LEB capacity).
>> This means that your FS will become slower when you reach this limit,
>> though maybe this can be addressed by triggering the GC before we run
>> out of free LEBs.
> 
> Right. I'd call it a detail. But the big picture is - if you have to GC
> all the data you write, you write twice. When exactly you do the second
> write is a detail - sometimes it is deferred, it is in background etc,
> sometimes right away - you have to GC older data before being able to
> write new data.

That is a valid concern.
But to me the idea sounds promising and is worth a try.
We will stress test it and figure how much the actual overhead is.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30 11:59             ` Richard Weinberger
@ 2015-10-30 12:29               ` Artem Bityutskiy
  2015-10-30 12:31                 ` Bityutskiy, Artem
  0 siblings, 1 reply; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-30 12:29 UTC (permalink / raw)
  To: Richard Weinberger, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
	Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 ""(beanhuo)""

On Fri, 2015-10-30 at 12:59 +0100, Richard Weinberger wrote:
> That is a valid concern.
> But to me the idea sounds promising and is worth a try.
> We will stress test it and figure how much the actual overhead is.

Well, for me the question of "do we double-write or try to do a better
job" is more of a fundamental question, not a concern.

Right now I personally do not share the opinion that doing a better job
is hard, and double write is easy. I may be wrong though. So for me it
does not look like - "hey, we'll just write twice, it is worth a try"
is a good starting point.

And then "hey, we did not try to do a better job, so we write twice,
let's have this upstream" is a strong position.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30 11:43           ` Artem Bityutskiy
  2015-10-30 11:59             ` Richard Weinberger
@ 2015-10-30 12:30             ` Boris Brezillon
  2015-10-30 12:41               ` Artem Bityutskiy
  1 sibling, 1 reply; 43+ messages in thread
From: Boris Brezillon @ 2015-10-30 12:30 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 """(beanhuo)"""

On Fri, 30 Oct 2015 13:43:15 +0200
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote:
> > Moreover, the standard GC only takes place when you can't find a free
> > LEB anymore, which will probably happen when you reach something
> > close
> > to half the partition size in case of MLC chips (it may be a bit
> > higher if you managed to occupy more than half of each LEB capacity).
> > This means that your FS will become slower when you reach this limit,
> > though maybe this can be addressed by triggering the GC before we run
> > out of free LEBs.
> 
> Right. I'd call it a detail. But the big picture is - if you have to GC
> all the data you write, you write twice. When exactly you do the second
> write is a detail - sometimes it is deferred, it is in background etc,
> sometimes right away - you have to GC older data before being able to
> write new data.

You're right, but it makes a big difference when all your writes are
taking longer because you need to run the GC to retrieve a free LEB, and
this is probably what's gonna happen when your FS raises ~1/2 its
maximum size. Doing it in background (collecting a few valid nodes on
each GC step and letting user operations take place between each of
these step) should help mitigating this problem.

> 
> Now, by no means I am criticizing you or your decisions, you are doing
> great job. I am more like summarizing and trying to give you some food
> for thoughts. :-)

No problem, I don't take it personally. I actually think arguing
on technical stuff is a good way to find the best solution ;-).

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30 12:29               ` Artem Bityutskiy
@ 2015-10-30 12:31                 ` Bityutskiy, Artem
  0 siblings, 0 replies; 43+ messages in thread
From: Bityutskiy, Artem @ 2015-10-30 12:31 UTC (permalink / raw)
  To: boris.brezillon@free-electrons.com, richard@nod.at
  Cc: beanhuo@micron.com, computersforpeace@gmail.com,
	Iwo.Mergler@netcommwireless.com, rnd4@dave-tech.it,
	linux-mtd@lists.infradead.org, dwmw2@infradead.org,
	jlauruhn@micron.com

On Fri, 2015-10-30 at 14:29 +0200, Artem Bityutskiy wrote:
> And then "hey, we did not try to do a better job, so we write twice,
> let's have this upstream" is a strong position.

I meant NOT a strong position here, sorry.

-- 
Best Regards,
Artem Bityutskiy
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30 12:30             ` Boris Brezillon
@ 2015-10-30 12:41               ` Artem Bityutskiy
  0 siblings, 0 replies; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-30 12:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Bean Huo 霍斌斌 """(beanhuo)"""

On Fri, 2015-10-30 at 13:30 +0100, Boris Brezillon wrote:
> You're right, but it makes a big difference when all your writes are
> taking longer because you need to run the GC to retrieve a free LEB,
> and
> this is probably what's gonna happen when your FS raises ~1/2 its
> maximum size. Doing it in background (collecting a few valid nodes on
> each GC step and letting user operations take place between each of
> these step) should help mitigating this problem.

It makes difference, yes. However, again, the worst case scenario is
that whenever I need to write, I have do GC, because I am "punished" by
previous writes. The worst case scenario is twice as slow write.

Guaranteed twice as fast wear is the other implication.

Increased power consumption is another one. Not every embedded system
will find the "you have to do a lot of job in background" UBIFS feature
attractive.

Anyway, could you spend a bit more time trying to provide convincing
arguments that doing "skip on demand" is hard, or does not gain
anything. You expressed this opinion, but so far it did not look 100%
convincing.

Thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: UBI/UBIFS: dealing with MLC's paired pages
  2015-10-30 11:49             ` Michal Suchanek
@ 2015-10-30 12:47               ` Artem Bityutskiy
  0 siblings, 0 replies; 43+ messages in thread
From: Artem Bityutskiy @ 2015-10-30 12:47 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Boris Brezillon, Iwo Mergler, Jeff Lauruhn (jlauruhn),
	Richard Weinberger, Andrea Scian, MTD Maling List, Brian Norris,
	David Woodhouse, Bean Huo 霍斌斌 (beanhuo)

On Fri, 2015-10-30 at 12:49 +0100, Michal Suchanek wrote:
> Actually, since there is no guarantee that the data ever gets written
> (or error reported in case it cannot) unless you fsync() every file
> before close() any sane uncompressor, backup program, etc. will
> fsync() every file written regardless of its size. So if your home
> has
> a lot of configuration files and sources this will not be nice
> continuous stream of data in many cases.
> 
> IIRC scp(1) does not fsync() potentially resulting in large amounts
> of
> data getting silently lost. For that reason and others it should be
> renamed to iscp.

This is true. If you get more or less steady stream of incoming writes
without syncs in-between, you do not need to sync the write-buffer, you
do not need to skip pages to cover the MLC paired pages. In these
situations we do not have to end up with double writing.

If there are no writes for some time, it is good idea to sync. VFS will
write-back by timeout, UBIFS will flush write-buffer by timeout.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2015-10-30 12:48 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon
2015-09-17 15:20 ` Artem Bityutskiy
2015-09-17 15:46   ` Boris Brezillon
2015-09-17 16:47     ` Richard Weinberger
2015-09-18  7:17       ` Andrea Scian
2015-09-18  7:41         ` Boris Brezillon
2015-09-18  7:54         ` Artem Bityutskiy
2015-09-18  7:57           ` Bityutskiy, Artem
2015-09-18  9:38           ` Andrea Scian
2015-09-24  1:57             ` Karl Zhang 张双锣 (karlzhang)
2015-09-24  6:31               ` Richard Weinberger
2015-09-24  7:43               ` Boris Brezillon
2015-09-24  9:44                 ` Stefan Roese
2015-09-29 11:19 ` Richard Weinberger
2015-09-29 12:51   ` Boris Brezillon
2015-10-23  8:14 ` Boris Brezillon
2015-10-27 20:16   ` Richard Weinberger
2015-10-28  9:24     ` Boris Brezillon
2015-10-28 10:44       ` Michal Suchanek
2015-10-28 11:14         ` Boris Brezillon
2015-10-28 15:50           ` Michal Suchanek
2015-10-28 12:24   ` Artem Bityutskiy
2015-10-30  8:15     ` Boris Brezillon
2015-10-30  8:21       ` Boris Brezillon
2015-10-30  8:50       ` Bean Huo 霍斌斌 (beanhuo)
2015-10-30  9:08       ` Artem Bityutskiy
2015-10-30  9:45         ` Boris Brezillon
2015-10-30 10:09           ` Artem Bityutskiy
2015-10-30 11:49             ` Michal Suchanek
2015-10-30 12:47               ` Artem Bityutskiy
2015-10-30 11:43           ` Artem Bityutskiy
2015-10-30 11:59             ` Richard Weinberger
2015-10-30 12:29               ` Artem Bityutskiy
2015-10-30 12:31                 ` Bityutskiy, Artem
2015-10-30 12:30             ` Boris Brezillon
2015-10-30 12:41               ` Artem Bityutskiy
2015-10-28 12:06 ` Artem Bityutskiy
     [not found] <A765B125120D1346A63912DDE6D8B6310BF4CAA8@NTXXIAMBX02.xacn.micron.com>
2015-09-25  7:30 ` Boris Brezillon
2015-09-25  8:25   ` Bean Huo 霍斌斌 (beanhuo)
2015-09-25  8:35     ` Richard Weinberger
2015-09-25  8:48     ` Boris Brezillon
2015-09-25  8:30   ` Karl Zhang 张双锣 (karlzhang)
2015-09-25  8:56     ` Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).