* since when does ARM map the kernel memory in sections? @ 2011-04-12 18:52 Peter Wächtler 2011-04-12 19:11 ` Colin Cross ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Peter Wächtler @ 2011-04-12 18:52 UTC (permalink / raw) To: linux-arm-kernel Hello Linux ARM developers, did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? Or was the memory always mapped in sections? I still have to chase a potential memory corruption. The rootfs is located on a SDcard and gets corrupted even when the filesystem test programs write to different partitions. The test scenario includes several dozen or even hundreds of warm and cold boot sequences, file system write tests with sudden soft resets. It's a large embedded project with a lot of drivers and the fact that always the rootfs and often the superblock gets damaged let me think of a memory corruption. Peter ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 18:52 since when does ARM map the kernel memory in sections? Peter Wächtler @ 2011-04-12 19:11 ` Colin Cross 2011-04-13 18:19 ` Peter Wächtler 2011-04-12 19:20 ` Andrei Warkentin 2011-04-12 20:15 ` Russell King - ARM Linux 2 siblings, 1 reply; 38+ messages in thread From: Colin Cross @ 2011-04-12 19:11 UTC (permalink / raw) To: linux-arm-kernel On Tue, Apr 12, 2011 at 11:52 AM, Peter W?chtler <pwaechtler@mac.com> wrote: > Hello Linux ARM developers, > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > Or was the memory always mapped in sections? > > I still have to chase a potential memory corruption. The rootfs is located on > a SDcard and gets corrupted even when the filesystem test programs write to > different partitions. > The test scenario includes several dozen or even hundreds of warm and cold > boot sequences, file system write tests with sudden soft resets. It's a large > embedded project with a lot of drivers and the fact that always the rootfs and > often the superblock gets damaged let me think of a memory corruption. Gary King posted some patches a while ago that switched the kernel back to page mappings, so he could modify attributes on some pages to be non-cacheable. The patches were not accepted, but you can probably dig them up for testing. ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 19:11 ` Colin Cross @ 2011-04-13 18:19 ` Peter Wächtler 0 siblings, 0 replies; 38+ messages in thread From: Peter Wächtler @ 2011-04-13 18:19 UTC (permalink / raw) To: linux-arm-kernel Am Dienstag, 12. April 2011, 21:11:59 schrieb Colin Cross: > On Tue, Apr 12, 2011 at 11:52 AM, Peter W?chtler <pwaechtler@mac.com> wrote: > > Hello Linux ARM developers, > > > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > > Or was the memory always mapped in sections? > > > > I still have to chase a potential memory corruption. The rootfs is > > located on a SDcard and gets corrupted even when the filesystem test > > programs write to different partitions. > > The test scenario includes several dozen or even hundreds of warm and > > cold boot sequences, file system write tests with sudden soft resets. > > It's a large embedded project with a lot of drivers and the fact that > > always the rootfs and often the superblock gets damaged let me think of > > a memory corruption. > > Gary King posted some patches a while ago that switched the kernel > back to page mappings, so he could modify attributes on some pages to > be non-cacheable. The patches were not accepted, but you can probably > dig them up for testing. Thanx for the tip. I already found the patch and applied it. Some probs on config but not tested yet... Peter ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 18:52 since when does ARM map the kernel memory in sections? Peter Wächtler 2011-04-12 19:11 ` Colin Cross @ 2011-04-12 19:20 ` Andrei Warkentin 2011-04-12 20:33 ` Jamie Lokier 2011-04-13 6:51 ` Peter Wächtler 2011-04-12 20:15 ` Russell King - ARM Linux 2 siblings, 2 replies; 38+ messages in thread From: Andrei Warkentin @ 2011-04-12 19:20 UTC (permalink / raw) To: linux-arm-kernel Hi Peter, 2011/4/12 Peter W?chtler <pwaechtler@mac.com>: > Hello Linux ARM developers, > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > Or was the memory always mapped in sections? > > I still have to chase a potential memory corruption. The rootfs is located on > a SDcard and gets corrupted even when the filesystem test programs write to > different partitions. > The test scenario includes several dozen or even hundreds of warm and cold > boot sequences, file system write tests with sudden soft resets. It's a large > embedded project with a lot of drivers and the fact that always the rootfs and > often the superblock gets damaged let me think of a memory corruption. > Sorry, I don't want to be obvious, but you mentioned sudden resets while writing, which is almost always going to wind up as fs corruptions, with the severity depending on the level of caching the system is doing to the writes. How are you mounting your rootfs and what file system are you using? What sort of corruptions to the super block are you seeing? A ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 19:20 ` Andrei Warkentin @ 2011-04-12 20:33 ` Jamie Lokier 2011-04-13 15:27 ` Nicolas Pitre 2011-04-18 13:52 ` Pavel Machek 2011-04-13 6:51 ` Peter Wächtler 1 sibling, 2 replies; 38+ messages in thread From: Jamie Lokier @ 2011-04-12 20:33 UTC (permalink / raw) To: linux-arm-kernel Andrei Warkentin wrote: > Hi Peter, > > 2011/4/12 Peter W?chtler <pwaechtler@mac.com>: > > Hello Linux ARM developers, > > > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > > Or was the memory always mapped in sections? > > > > I still have to chase a potential memory corruption. The rootfs is located on > > a SDcard and gets corrupted even when the filesystem test programs write to > > different partitions. > > The test scenario includes several dozen or even hundreds of warm and cold > > boot sequences, file system write tests with sudden soft resets. It's a large > > embedded project with a lot of drivers and the fact that always the rootfs and > > often the superblock gets damaged let me think of a memory corruption. > > > > Sorry, I don't want to be obvious, but you mentioned sudden resets > while writing, which is almost always going to wind > up as fs corruptions, with the severity depending on the level of > caching the system is doing to the writes. > How are you mounting your rootfs and what file system are you using? > What sort of corruptions to the super block are you seeing? If everything is implemented correctly, that depends on the type of filesystem, block layer and storage. Some are explicitly designed to be safe against sudden reboots and power failure - which is an important feature of systems where removing the power is how they are turned off at night. -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 20:33 ` Jamie Lokier @ 2011-04-13 15:27 ` Nicolas Pitre 2011-04-13 20:11 ` Jamie Lokier 2011-04-18 13:52 ` Pavel Machek 1 sibling, 1 reply; 38+ messages in thread From: Nicolas Pitre @ 2011-04-13 15:27 UTC (permalink / raw) To: linux-arm-kernel On Tue, 12 Apr 2011, Jamie Lokier wrote: > Andrei Warkentin wrote: > > Hi Peter, > > > > 2011/4/12 Peter W?chtler <pwaechtler@mac.com>: > > > Hello Linux ARM developers, > > > > > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > > > Or was the memory always mapped in sections? > > > > > > I still have to chase a potential memory corruption. The rootfs is located on > > > a SDcard and gets corrupted even when the filesystem test programs write to > > > different partitions. > > > The test scenario includes several dozen or even hundreds of warm and cold > > > boot sequences, file system write tests with sudden soft resets. It's a large > > > embedded project with a lot of drivers and the fact that always the rootfs and > > > often the superblock gets damaged let me think of a memory corruption. > > > > > > > Sorry, I don't want to be obvious, but you mentioned sudden resets > > while writing, which is almost always going to wind > > up as fs corruptions, with the severity depending on the level of > > caching the system is doing to the writes. > > How are you mounting your rootfs and what file system are you using? > > What sort of corruptions to the super block are you seeing? > > If everything is implemented correctly, that depends on the type of > filesystem, block layer and storage. Some are explicitly designed to > be safe against sudden reboots and power failure - which is an > important feature of systems where removing the power is how they are > turned off at night. SD was mentioned as being the storage medium in this thread. I really doubt SD cards are designed to be safe against sudden power outages. Nicolas ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-13 15:27 ` Nicolas Pitre @ 2011-04-13 20:11 ` Jamie Lokier 0 siblings, 0 replies; 38+ messages in thread From: Jamie Lokier @ 2011-04-13 20:11 UTC (permalink / raw) To: linux-arm-kernel Nicolas Pitre wrote: > On Tue, 12 Apr 2011, Jamie Lokier wrote: > > > Andrei Warkentin wrote: > > > Hi Peter, > > > > > > 2011/4/12 Peter W?chtler <pwaechtler@mac.com>: > > > > Hello Linux ARM developers, > > > > > > > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > > > > Or was the memory always mapped in sections? > > > > > > > > I still have to chase a potential memory corruption. The rootfs is located on > > > > a SDcard and gets corrupted even when the filesystem test programs write to > > > > different partitions. > > > > The test scenario includes several dozen or even hundreds of warm and cold > > > > boot sequences, file system write tests with sudden soft resets. It's a large > > > > embedded project with a lot of drivers and the fact that always the rootfs and > > > > often the superblock gets damaged let me think of a memory corruption. > > > > > > > > > > Sorry, I don't want to be obvious, but you mentioned sudden resets > > > while writing, which is almost always going to wind > > > up as fs corruptions, with the severity depending on the level of > > > caching the system is doing to the writes. > > > How are you mounting your rootfs and what file system are you using? > > > What sort of corruptions to the super block are you seeing? > > > > If everything is implemented correctly, that depends on the type of > > filesystem, block layer and storage. Some are explicitly designed to > > be safe against sudden reboots and power failure - which is an > > important feature of systems where removing the power is how they are > > turned off at night. > > SD was mentioned as being the storage medium in this thread. I really > doubt SD cards are designed to be safe against sudden power outages. Ah, fair enough, you are sadly almost certainly right. (Some flash even corrupts itself on power removal during reads too, due to internal reorganisation. I have no idea if SD is in that category. It is quite a problem when searching for something to replace an old-fashioned hard disk in an application where removing the power is normal daily usage, and traditional journalling filesystems used to be effective.) -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 20:33 ` Jamie Lokier 2011-04-13 15:27 ` Nicolas Pitre @ 2011-04-18 13:52 ` Pavel Machek 2011-04-18 17:07 ` Jamie Lokier 2011-04-18 19:21 ` Peter Waechtler 1 sibling, 2 replies; 38+ messages in thread From: Pavel Machek @ 2011-04-18 13:52 UTC (permalink / raw) To: linux-arm-kernel Hi! > > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > > > Or was the memory always mapped in sections? > > > > > > I still have to chase a potential memory corruption. The rootfs is located on > > > a SDcard and gets corrupted even when the filesystem test programs write to > > > different partitions. > > > The test scenario includes several dozen or even hundreds of warm and cold > > > boot sequences, file system write tests with sudden soft resets. It's a large > > > embedded project with a lot of drivers and the fact that always the rootfs and > > > often the superblock gets damaged let me think of a memory corruption. > > > > > > > Sorry, I don't want to be obvious, but you mentioned sudden resets > > while writing, which is almost always going to wind > > up as fs corruptions, with the severity depending on the level of > > caching the system is doing to the writes. > > How are you mounting your rootfs and what file system are you using? > > What sort of corruptions to the super block are you seeing? > > If everything is implemented correctly, that depends on the type of > filesystem, block layer and storage. Some are explicitly designed to > be safe against sudden reboots and power failure - which is an > important feature of systems where removing the power is how they are > turned off at night. ...but note that no existing filesystem is safe on media such as usb sticks, SD and CF cards... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-18 13:52 ` Pavel Machek @ 2011-04-18 17:07 ` Jamie Lokier 2011-04-18 17:17 ` Nicolas Pitre 2011-04-22 15:47 ` Pavel Machek 2011-04-18 19:21 ` Peter Waechtler 1 sibling, 2 replies; 38+ messages in thread From: Jamie Lokier @ 2011-04-18 17:07 UTC (permalink / raw) To: linux-arm-kernel Pavel Machek wrote: > ...but note that no existing filesystem is safe on media such > as usb sticks, SD and CF cards... Sadly. Do you know by now if enough information is available now to make it possible even in principle? Alternatively, do you know if there's been much progress characterising particular manufacturers/parts/brands or identifiable characteristics etc. to find ones on which a power fail safe filesystem is possible, even if it's just in principle for the moment? Sadly I know of a large number of systems using CF or cheap ATA SSDs that were installed assuming ext3 is reliable when they are powered off at the wall every day. That's because a lot of people assume they are solid-state drop-in replacements for hard disks, as that is how they are advertised. -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-18 17:07 ` Jamie Lokier @ 2011-04-18 17:17 ` Nicolas Pitre 2011-04-22 15:47 ` Pavel Machek 1 sibling, 0 replies; 38+ messages in thread From: Nicolas Pitre @ 2011-04-18 17:17 UTC (permalink / raw) To: linux-arm-kernel On Mon, 18 Apr 2011, Jamie Lokier wrote: > Pavel Machek wrote: > > ...but note that no existing filesystem is safe on media such > > as usb sticks, SD and CF cards... > > Sadly. Do you know by now if enough information is available now to > make it possible even in principle? > > Alternatively, do you know if there's been much progress > characterising particular manufacturers/parts/brands or identifiable > characteristics etc. to find ones on which a power fail safe > filesystem is possible, even if it's just in principle for the moment? Have a look at: https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey Nicolas ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-18 17:07 ` Jamie Lokier 2011-04-18 17:17 ` Nicolas Pitre @ 2011-04-22 15:47 ` Pavel Machek 2011-04-23 9:23 ` Linus Walleij 1 sibling, 1 reply; 38+ messages in thread From: Pavel Machek @ 2011-04-22 15:47 UTC (permalink / raw) To: linux-arm-kernel Hi! > > ...but note that no existing filesystem is safe on media such > > as usb sticks, SD and CF cards... > > Sadly. Do you know by now if enough information is available now to > make it possible even in principle? > Alternatively, do you know if there's been much progress > characterising particular manufacturers/parts/brands or identifiable > characteristics etc. to find ones on which a power fail safe > filesystem is possible, even if it's just in principle for the moment? There's nice table somewhere. If at least size of "big block" (usually 4MB) is known, safe fs should be possible. OTOH 4MB commit block is not going to be too efficient. Plus, I was told new MMC standard has "write reliable" option... > Sadly I know of a large number of systems using CF or cheap ATA SSDs > that were installed assuming ext3 is reliable when they are powered > off at the wall every day. That's because a lot of people assume they > are solid-state drop-in replacements for hard disks, as that is how > they are advertised. False advertising, I'm afraid. I wanter if SATA SSDs do better...? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-22 15:47 ` Pavel Machek @ 2011-04-23 9:23 ` Linus Walleij 2011-04-26 10:33 ` Per Forlin 0 siblings, 1 reply; 38+ messages in thread From: Linus Walleij @ 2011-04-23 9:23 UTC (permalink / raw) To: linux-arm-kernel 2011/4/22 Pavel Machek <pavel@ucw.cz>: > Plus, I was told new MMC standard has "write reliable" option... I think Per F?rlin looked into reliable write. The latest eMMC cards has this, but OTOMH it was too darn slow to be used on current chips/"cards". Per, Sebastian: any details? Yours, Linus Walleij ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-23 9:23 ` Linus Walleij @ 2011-04-26 10:33 ` Per Forlin 2011-04-26 19:00 ` Peter Waechtler 2011-04-26 20:24 ` Andrei Warkentin 0 siblings, 2 replies; 38+ messages in thread From: Per Forlin @ 2011-04-26 10:33 UTC (permalink / raw) To: linux-arm-kernel On 23 April 2011 11:23, Linus Walleij <linus.walleij@linaro.org> wrote: > 2011/4/22 Pavel Machek <pavel@ucw.cz>: > >> Plus, I was told new MMC standard has "write reliable" option... > > I think Per F?rlin looked into reliable write. The latest eMMC cards > has this, but OTOMH it was too darn slow to be used on current > chips/"cards". > > Per, Sebastian: any details? I had plans to add reliable writes and do benchmarking but I never got to it. Right now I have no plans to pick it up. > > Yours, > Linus Walleij > Regards, Per ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-26 10:33 ` Per Forlin @ 2011-04-26 19:00 ` Peter Waechtler 2011-04-26 19:07 ` Jamie Lokier 2011-04-26 20:24 ` Andrei Warkentin 1 sibling, 1 reply; 38+ messages in thread From: Peter Waechtler @ 2011-04-26 19:00 UTC (permalink / raw) To: linux-arm-kernel Am Dienstag, 26. April 2011, 12:33:29 schrieb Per Forlin: > On 23 April 2011 11:23, Linus Walleij <linus.walleij@linaro.org> wrote: > > 2011/4/22 Pavel Machek <pavel@ucw.cz>: > >> Plus, I was told new MMC standard has "write reliable" option... > > > > I think Per F?rlin looked into reliable write. The latest eMMC cards > > has this, but OTOMH it was too darn slow to be used on current > > chips/"cards". > > > > Per, Sebastian: any details? > > I had plans to add reliable writes and do benchmarking but I never got > to it. Right now I have no plans to pick it up. > > > Yours, > > Linus Walleij > > Regards, > Per As far as I understood the spec, reliable write only makes statements like either the old data is still intact - or the new data was written (completely). Peter ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-26 19:00 ` Peter Waechtler @ 2011-04-26 19:07 ` Jamie Lokier 2011-04-26 20:38 ` MMC and reliable write - was: " Peter Waechtler 0 siblings, 1 reply; 38+ messages in thread From: Jamie Lokier @ 2011-04-26 19:07 UTC (permalink / raw) To: linux-arm-kernel Peter Waechtler wrote: > Am Dienstag, 26. April 2011, 12:33:29 schrieb Per Forlin: > > On 23 April 2011 11:23, Linus Walleij <linus.walleij@linaro.org> wrote: > > > 2011/4/22 Pavel Machek <pavel@ucw.cz>: > > >> Plus, I was told new MMC standard has "write reliable" option... > > > > > > I think Per F?rlin looked into reliable write. The latest eMMC cards > > > has this, but OTOMH it was too darn slow to be used on current > > > chips/"cards". > > > > > > Per, Sebastian: any details? > > > > I had plans to add reliable writes and do benchmarking but I never got > > to it. Right now I have no plans to pick it up. > > > > > Yours, > > > Linus Walleij > > > > Regards, > > Per > > As far as I understood the spec, reliable write only makes statements like > either the old data is still intact - or the new data was written > (completely). Hmm, if that's a correct understanding, it's not very useful for fsync() or journal barriers (unless the spec implies something barrier-like), and it would be nice if there were guarantees about the _other_ data (that isn't being written at all) not getting corrupted as a side effect. -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-04-26 19:07 ` Jamie Lokier @ 2011-04-26 20:38 ` Peter Waechtler 2011-04-26 22:45 ` Jamie Lokier 0 siblings, 1 reply; 38+ messages in thread From: Peter Waechtler @ 2011-04-26 20:38 UTC (permalink / raw) To: linux-arm-kernel Am Dienstag, 26. April 2011, 21:07:19 schrieb Jamie Lokier: > Peter Waechtler wrote: > > Am Dienstag, 26. April 2011, 12:33:29 schrieb Per Forlin: > > > On 23 April 2011 11:23, Linus Walleij <linus.walleij@linaro.org> wrote: > > > > 2011/4/22 Pavel Machek <pavel@ucw.cz>: > > > >> Plus, I was told new MMC standard has "write reliable" option... > > > > > > > > I think Per F?rlin looked into reliable write. The latest eMMC cards > > > > has this, but OTOMH it was too darn slow to be used on current > > > > chips/"cards". > > > > > > > > Per, Sebastian: any details? > > > > > > I had plans to add reliable writes and do benchmarking but I never got > > > to it. Right now I have no plans to pick it up. > > > > > > > Yours, > > > > Linus Walleij > > > > > > Regards, > > > Per > > > > As far as I understood the spec, reliable write only makes statements > > like either the old data is still intact - or the new data was written > > (completely). > > Hmm, if that's a correct understanding, it's not very useful for > fsync() or journal barriers (unless the spec implies something > barrier-like), and it would be nice if there were guarantees about the > _other_ data (that isn't being written at all) not getting corrupted > as a side effect. > > -- Jamie well, I cannot interpret this already, but it reads scary ;) JEDEC Standard No. 84-A441 Page 56 Reliable Write: Multiple block write with pre-defined block count and Reliable Write parameters. This transaction is similar to the basic pre- defined multiple-block write (defined in previous bullet) with the following exceptions. The old data pointed to by a logical address must remain unchanged until the new data written to same logical address has been successfully programmed. This is to ensure that the target address updated by the reliable write transaction never contains undefined data. Data must remain valid even if a sudden power loss occurs during the programming. There are two versions of reliable write: legacy implementation and the enhance implementation. The type of reliable write supported by the device is indicated by the EN_REL_WR bit in the WR_REL_PARAM extended CSD register. For the case of EN_REL_WR = 0 : More fun on page 147ff: ? WR_REL_SET [167] The write reliability settings register indicates the reliability setting for each of the user and general area partitions in the device. The contents of this register are read only if the HS_CTRL_REL is 0 in the WR_REL_PARAM extended CSD register. The default value of these bits is not specified and is determined by the device. it goes on with: Bit[4]: WR_DATA_REL_4 0x0: In general purpose partition 4, the write operation has been optimized for performance and existing data in the partition could be at risk if a power failure occurs. 0x1: In general purpose partition 4, the device protects previously written data if power failure occurs during a write operation. ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-04-26 20:38 ` MMC and reliable write - was: " Peter Waechtler @ 2011-04-26 22:45 ` Jamie Lokier 2011-04-27 1:13 ` Andrei Warkentin 0 siblings, 1 reply; 38+ messages in thread From: Jamie Lokier @ 2011-04-26 22:45 UTC (permalink / raw) To: linux-arm-kernel Peter Waechtler wrote: > JEDEC Standard No. 84-A441 > Page 56 > > > Reliable Write: Multiple block write with pre-defined block count and > Reliable Write parameters. This transaction is similar to the basic pre- > defined multiple-block write (defined in previous bullet) with the > following exceptions. The old data pointed to by a logical address must remain > unchanged until the new data written to same logical address has been > successfully programmed. This is to ensure that the target address > updated by the reliable write transaction never contains undefined data. > > Data must remain valid even if a sudden power loss occurs during the > programming. > > There are two versions of reliable write: legacy implementation and the > enhance implementation. The type of reliable write supported by the device is > indicated by the EN_REL_WR bit in the > WR_REL_PARAM extended CSD register. > For the case of EN_REL_WR = 0 : > > > More fun on page 147ff: > > ? WR_REL_SET [167] > The write reliability settings register indicates the reliability setting for > each of the user and general > area partitions in the device. The contents of this register are read only if > the HS_CTRL_REL is 0 in > the WR_REL_PARAM extended CSD register. The default value of these bits is not > specified and is > determined by the device. > > > it goes on with: > > Bit[4]: WR_DATA_REL_4 > 0x0: In general purpose partition 4, the write operation has been optimized > for performance and existing data in the partition could be at risk if a power > failure occurs. > > 0x1: In general purpose partition 4, the device protects previously written > data if power failure occurs during a write operation. Hmm... It all hinges on whether "previously written data" refers just to the region being overwritten, or to all the other data in the partition? If MMC writes are specified to only affect the data being written with a Write command, and to have stably committed the data when Write returns, then "Reliable Write" just means "atomic", and filesystems and databases don't actually need that. Hard disks don't guarantee that, and it's not a problem. Filesystems and databases need barriers and/or durable (stable) commits, and for writes in one area not to corrupt data in a different area. *That's* a problem with other flash devices (and possibly some RAIDs): Writes to one area can corrupt data in sectors that aren't being written to, over quite a large distance. I can't tell from the above specification excerpt (by itself) what is being guaranteed; it seems ambiguous, but maybe there's a clearer definition elsewhere. It is conceivable that checksums and metadata could be stored into a "reliable" partition and some kinds of file data into an "unreliable" partition, where filesystem integrity is important and nobody cares about the actual data! :-) -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-04-26 22:45 ` Jamie Lokier @ 2011-04-27 1:13 ` Andrei Warkentin 2011-04-27 13:07 ` Jamie Lokier 0 siblings, 1 reply; 38+ messages in thread From: Andrei Warkentin @ 2011-04-27 1:13 UTC (permalink / raw) To: linux-arm-kernel Hi Jamie, On Tue, Apr 26, 2011 at 5:45 PM, Jamie Lokier <jamie@shareable.org> wrote: > Peter Waechtler wrote: >> JEDEC Standard No. 84-A441 >> Page 56 >> >> >> Reliable Write: Multiple block write with pre-defined block count and >> Reliable Write parameters. This transaction is similar to the basic pre- >> defined multiple-block write (defined in previous bullet) with the >> following exceptions. The old data pointed to by a logical address must remain >> unchanged until the new data written to same logical address has been >> successfully programmed. This is to ensure that the target address >> updated by the reliable write transaction never contains undefined data. >> >> Data must remain valid even if a sudden power loss occurs during the >> programming. >> >> There are two versions of reliable write: legacy implementation and the >> enhance implementation. The type of reliable write supported by the device is >> indicated by the EN_REL_WR bit in the >> WR_REL_PARAM extended CSD register. >> ?For the case of EN_REL_WR = 0 : >> >> >> More fun on page 147ff: >> >> ? WR_REL_SET [167] >> The write reliability settings register indicates the reliability setting for >> each of the user and general >> area partitions in the device. The contents of this register are read only if >> the HS_CTRL_REL is 0 in >> the WR_REL_PARAM extended CSD register. The default value of these bits is not >> specified and is >> determined by the device. >> >> >> it goes on with: >> >> Bit[4]: WR_DATA_REL_4 >> 0x0: In general purpose partition 4, the write operation has been optimized >> for performance and existing data in the partition could be at risk if a power >> failure occurs. >> >> 0x1: In general purpose partition 4, the device protects previously written >> data if power failure occurs during a write operation. > > Hmm... ?It all hinges on whether "previously written data" refers just > to the region being overwritten, or to all the other data in the > partition? > > If MMC writes are specified to only affect the data being written with > a Write command, and to have stably committed the data when Write > returns, then "Reliable Write" just means "atomic", and filesystems > and databases don't actually need that. > > Hard disks don't guarantee that, and it's not a problem. ?Filesystems > and databases need barriers and/or durable (stable) commits, and for > writes in one area not to corrupt data in a different area. > > *That's* a problem with other flash devices (and possibly some RAIDs): > Writes to one area can corrupt data in sectors that aren't being > written to, over quite a large distance. > > I can't tell from the above specification excerpt (by itself) what is > being guaranteed; it seems ambiguous, but maybe there's a clearer > definition elsewhere. > > It is conceivable that checksums and metadata could be stored into a > "reliable" partition and some kinds of file data into an "unreliable" > partition, where filesystem integrity is important and nobody cares > about the actual data! :-) > > -- Jamie > I think this basically says - don't end up with corrupt flash if I pull the power when doing this MMC transaction. If you pull power during a regular write, you could end up with ALL erase units affected being wiped. Note, that the new definition of reliable writes provides a guarantee to a sector boundary. So if you interrupt the transaction, you will end up with [new data] followed by [old data]. The old definition guaranteed the entire range, but the transaction was only reliable when done over a sector or erase unit. This means I jumped the gun on implementing REQ_FUA as reliable write, as REQ_FUA says nothing about atomicity. OTOH, I don't think anything in the block layer expects massive data corruption on power loss. In my defence, I saw REQ_FUA as being "prevent data corruption during power loss", hence the reliable write via REQ_FUA in mmc layer. So my question - a) how should reliable writes be handled? REQ_META? b) how do we make sure to not wind up with data corruption and MMCs for work loads where you know power can be removed at any moment? We could always turn on reliable writes (not good perf wise). We could turn on reliable writes for a particular range (enhanced user partition). We could also turn on reliable writes for a specific hardware partition. We could even create mapping layer that will occasionally atomically flush data to flash, while the actual fs accesses go to RAM. A ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-04-27 1:13 ` Andrei Warkentin @ 2011-04-27 13:07 ` Jamie Lokier 2011-04-27 19:18 ` Andrei Warkentin 0 siblings, 1 reply; 38+ messages in thread From: Jamie Lokier @ 2011-04-27 13:07 UTC (permalink / raw) To: linux-arm-kernel Andrei Warkentin wrote: > I think this basically says - don't end up with corrupt flash if I > pull the power when doing this MMC transaction. > If you pull power during a regular write, you could end up with ALL > erase units affected being wiped. > > Note, that the new definition of reliable writes provides a guarantee > to a sector boundary. So if you interrupt > the transaction, you will end up with [new data] followed by [old > data]. The old definition guaranteed the entire range, > but the transaction was only reliable when done over a sector or erase unit. The old definition might not have been implemented in practice, or might have caused performance problems -- or maybe it just wasn't that useful, because it's so different from what hard-disk-like filesystems expect of a block device. > This means I jumped the gun on implementing REQ_FUA as reliable write, > as REQ_FUA says nothing about atomicity. > OTOH, I don't think anything in the block layer expects massive data > corruption on power loss. In my defence, I saw REQ_FUA > as being "prevent data corruption during power loss", hence the > reliable write via REQ_FUA in mmc layer. > > So my question - > a) how should reliable writes be handled? If your understanding is this: - "Reliable Write" only affects the range being written - "Normal Write" can corrupt ANY random part of the flash (because you don't know where the physical erase blocks are, or what reorganising it might provoke.) Then the answer's pretty clear. You have to use "Reliable Write" for everything. > REQ_META? No, that's a scheduling hint; you can't assume filesystems consistently label "metadata needed for filesystem integrity" with that flag. (And databases and VMs have similar needs, but don't get to choose REQ_ flags). But even if they did, wouldn't a single normal write, from the above description, potentially corrupt all previously written metadata anyway, making it pointless? > b) how do we make sure to not wind up with data corruption and MMCs > for work loads where you know power can be removed at any moment? > We could always turn on reliable writes (not good perf wise). We could > turn on reliable writes for a particular range (enhanced user > partition). We could also turn on reliable writes for a specific > hardware partition. It might have to be simply a mount option - let the user decide their priorities. What's "enhanced user partition" -- is it a device feature? > We could even create mapping layer that will occasionally atomically > flush data to flash, while the actual fs accesses go to RAM. So using "Reliable Write" all the time, and using a flash-optimised filesystem (MTD-style like jffs2, ubifs, logfs) to group the writes consecutively into sensible block sizes? I guess if the first small "Reliable Write" is quite slow, and a flash-optimised filesystem that performs write-behind just like disk filesystems, then will be plenty more data records queued up for writing after it, automatically making the write sizes increase to match the media's speed. Add a little "anticipatory scheduling" perhaps. I presume "Reliable Write" must be to a contiguous range of the MMC's logical block presentation? -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-04-27 13:07 ` Jamie Lokier @ 2011-04-27 19:18 ` Andrei Warkentin 2011-04-27 19:33 ` Arnd Bergmann 2011-05-03 8:04 ` Jamie Lokier 0 siblings, 2 replies; 38+ messages in thread From: Andrei Warkentin @ 2011-04-27 19:18 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 27, 2011 at 8:07 AM, Jamie Lokier <jamie@shareable.org> wrote: > Andrei Warkentin wrote: >> I think this basically says - don't end up with corrupt flash if I >> pull the power when doing this MMC transaction. >> If you pull power during a regular write, you could end up with ALL >> erase units affected being wiped. >> >> Note, that the new definition of reliable writes provides a guarantee >> to a sector boundary. So if you interrupt >> the transaction, you will end up with [new data] followed by [old >> data]. The old definition guaranteed the entire range, >> but the transaction was only reliable when done over a sector or erase unit. > > The old definition might not have been implemented in practice, or > might have caused performance problems -- or maybe it just wasn't that > useful, because it's so different from what hard-disk-like filesystems > expect of a block device. > >> This means I jumped the gun on implementing REQ_FUA as reliable write, >> as REQ_FUA says nothing about atomicity. >> OTOH, I don't think anything in the block layer expects massive data >> corruption on power loss. In my defence, I saw REQ_FUA >> as being "prevent data corruption during power loss", hence the >> reliable write via REQ_FUA in mmc layer. >> >> So my question - >> a) how should reliable writes be handled? > > If your understanding is this: > > ? - "Reliable Write" only affects the range being written > > ? - "Normal Write" can corrupt ANY random part of the flash > ? ? (because you don't know where the physical erase blocks are, or > ? ? what reorganising it might provoke.) > > Then the answer's pretty clear. > You have to use "Reliable Write" for everything. > >> REQ_META? > > No, that's a scheduling hint; you can't assume filesystems > consistently label "metadata needed for filesystem integrity" with > that flag. ?(And databases and VMs have similar needs, but don't get > to choose REQ_ flags). > > But even if they did, wouldn't a single normal write, from the above > description, potentially corrupt all previously written metadata > anyway, making it pointless? Gah... yes. > >> b) how do we make sure to not wind up with data corruption and MMCs >> for work loads where you know power can be removed at any moment? > >> We could always turn on reliable writes (not good perf wise). We could >> turn on reliable writes for a particular range (enhanced user >> partition). ?We could also turn on reliable writes for a specific >> hardware partition. > > It might have to be simply a mount option - let the user decide their > priorities. So basically add a new REQ_ flag - something like REQ_SAFE, which would ensure that data on block storage is not corrupted due to interrupting this write (or even, after the write, if the card does some optimizations). We already have a flag that ensures corruptions don't occur because of local-to-disk caches - REQ_FUA, so this would just thinking about what effects REQ_FUA already has that's not considered. On a (spinning) disk, I can't image that interrupting a REQ_FUA write would cause data loss somewhere other than where data was written. Then it would be as simple as a mount flag that would ensure all (write) accesses are FUA accesses, to ensure desired behavior for platforms where power could be cut at any moment. What do you think? Yes, all write transactions for MMC are contiguous. A ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-04-27 19:18 ` Andrei Warkentin @ 2011-04-27 19:33 ` Arnd Bergmann 2011-05-03 8:04 ` Jamie Lokier 1 sibling, 0 replies; 38+ messages in thread From: Arnd Bergmann @ 2011-04-27 19:33 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 27 April 2011 21:18:16 Andrei Warkentin wrote: > On Wed, Apr 27, 2011 at 8:07 AM, Jamie Lokier <jamie@shareable.org> wrote: > > Andrei Warkentin wrote: > > No, that's a scheduling hint; you can't assume filesystems > > consistently label "metadata needed for filesystem integrity" with > > that flag. (And databases and VMs have similar needs, but don't get > > to choose REQ_ flags). > > > > But even if they did, wouldn't a single normal write, from the above > > description, potentially corrupt all previously written metadata > > anyway, making it pointless? > > Gah... yes. I've also seen devices that produce silent bit errors or just swap blocks around without any power fail scenario, no matter what or how you write to them. I believe we don't need to support those. We should find out what the guarantees are that the eMMC standard is giving. It should be possible to build media that can not corrupt data written with reliable writes by writing to other data in the same erase block. If the standard requires that to happen, I'd say we should rely on it and consider the other media as broken. Arnd ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-04-27 19:18 ` Andrei Warkentin 2011-04-27 19:33 ` Arnd Bergmann @ 2011-05-03 8:04 ` Jamie Lokier 2011-06-06 10:28 ` Pavel Machek 1 sibling, 1 reply; 38+ messages in thread From: Jamie Lokier @ 2011-05-03 8:04 UTC (permalink / raw) To: linux-arm-kernel Andrei Warkentin wrote: > >> b) how do we make sure to not wind up with data corruption and MMCs > >> for work loads where you know power can be removed at any moment? > > > >> We could always turn on reliable writes (not good perf wise). We could > >> turn on reliable writes for a particular range (enhanced user > >> partition). ?We could also turn on reliable writes for a specific > >> hardware partition. > > > > It might have to be simply a mount option - let the user decide their > > priorities. > > So basically add a new REQ_ flag - something like REQ_SAFE, which > would ensure that data > on block storage is not corrupted due to interrupting this write (or > even, after the write, if the card does some optimizations). We > already have a flag that ensures corruptions don't occur > because of local-to-disk caches - REQ_FUA, so this would just thinking > about what effects REQ_FUA already has that's not considered. On a > (spinning) disk, I can't image that interrupting a REQ_FUA write would > cause data loss somewhere other than where data was written. > > Then it would be as simple as a mount flag that would ensure all > (write) accesses are FUA accesses, to ensure desired behavior for > platforms where power could be cut at any moment. I think you're mixing up different concepts. On a spinning hard disk, _all_ writes don't cause data loss other than where data is written, rounded up to the sector (512 or 4096 bytes). The FUA flag doesn't make any difference to this. Storage durability doesn't depend on FUA. It makes no difference except to performance - on disks without FUA support (like my laptop), filesystems just issue more cache flush commands. Or you can also disable the write cache, the effect of which varies between hardly noticable and awfully slow, depending on the disk. (Some RAIDs may violate both these principles. It depends how they are implemented. I'm not clear on where Linux software RAID sits with this.) So don't think of FUA as having any connection with reliability or durability. It's just an optional performance optimisation. I don't think REQ_SAFE is a useful name for the MMC option as it doesn't appear to be safe, if later non-reliable writes can randomly clobber the REQ_SAFE data. As it has been explained so far, I don't see how filesystems can take advantage of the reliable/unreliable write distinction, without more precise constraints on what that means. So I don't think there's any point in filesystems issuing both types of request, unless those more precise constraints appear somewhere. Hence the idea of making it a block device/partition flag. Similar to the way hdparm is used to manage a hard disk's "write cache enabled" bit, there could be a "mmc use reliable writes" bit, a "mmc has reliable writes" read-only bit, and a "mmc hard partition is reliable" bit which may or may not be writable. If it later emerges that filesystems can benefit from the distinction, add a REQ_ flag at that time. Even then, filesystems may need to know the MMC's hard partition mode, in order to make useful decisions. -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-05-03 8:04 ` Jamie Lokier @ 2011-06-06 10:28 ` Pavel Machek 2011-06-06 20:38 ` Peter Waechtler 0 siblings, 1 reply; 38+ messages in thread From: Pavel Machek @ 2011-06-06 10:28 UTC (permalink / raw) To: linux-arm-kernel Hi! > > So basically add a new REQ_ flag - something like REQ_SAFE, which > > would ensure that data > > on block storage is not corrupted due to interrupting this write (or > > even, after the write, if the card does some optimizations). We > > already have a flag that ensures corruptions don't occur > > because of local-to-disk caches - REQ_FUA, so this would just thinking > > about what effects REQ_FUA already has that's not considered. On a > > (spinning) disk, I can't image that interrupting a REQ_FUA write would > > cause data loss somewhere other than where data was written. > > > > Then it would be as simple as a mount flag that would ensure all > > (write) accesses are FUA accesses, to ensure desired behavior for > > platforms where power could be cut at any moment. > > I think you're mixing up different concepts. > > On a spinning hard disk, _all_ writes don't cause data loss other than > where data is written, rounded up to the sector (512 or 4096 bytes). ... Yes, so on mmc there are two different problems: * reliability of write itself (REL_WRITE solves that) * reliability of data around write (there are for bits "controlling" it in 4.4.1 MMC specs, unfortunately they are only writable by card manufacturer AFAICS). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* MMC and reliable write - was: since when does ARM map the kernel memory in sections? 2011-06-06 10:28 ` Pavel Machek @ 2011-06-06 20:38 ` Peter Waechtler 0 siblings, 0 replies; 38+ messages in thread From: Peter Waechtler @ 2011-06-06 20:38 UTC (permalink / raw) To: linux-arm-kernel Am Montag, 6. Juni 2011, 12:28:55 schrieb Pavel Machek: > Hi! > > > > So basically add a new REQ_ flag - something like REQ_SAFE, which > > > would ensure that data > > > on block storage is not corrupted due to interrupting this write (or > > > even, after the write, if the card does some optimizations). We > > > already have a flag that ensures corruptions don't occur > > > because of local-to-disk caches - REQ_FUA, so this would just thinking > > > about what effects REQ_FUA already has that's not considered. On a > > > (spinning) disk, I can't image that interrupting a REQ_FUA write would > > > cause data loss somewhere other than where data was written. > > > > > > Then it would be as simple as a mount flag that would ensure all > > > (write) accesses are FUA accesses, to ensure desired behavior for > > > platforms where power could be cut at any moment. > > > > I think you're mixing up different concepts. > > > > On a spinning hard disk, _all_ writes don't cause data loss other than > > where data is written, rounded up to the sector (512 or 4096 bytes). > > ... > > Yes, so on mmc there are two different problems: > > * reliability of write itself (REL_WRITE solves that) > > * reliability of data around write (there are for bits "controlling" > it in 4.4.1 MMC specs, unfortunately they are only writable by card > manufacturer AFAICS). > Pavel Yes - and there is another issue (but you could say that it fits to 2): background operations After programming a page the "pairing" page should be corrected. Then there is garbage collection and wear leveling. If this is interrupted/disturbed by a sudden power loss "at the wrong moment": user data gets possibly corrupted. The power supply has to give you a signal that power will be lost, say, in 30 ms. Now a mechanism has to be specified how to tell the device to stop any programming in a timely fashion - the HPI (high prio interrupt) comes to mind. Peter ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-26 10:33 ` Per Forlin 2011-04-26 19:00 ` Peter Waechtler @ 2011-04-26 20:24 ` Andrei Warkentin 2011-04-26 22:58 ` Jamie Lokier 1 sibling, 1 reply; 38+ messages in thread From: Andrei Warkentin @ 2011-04-26 20:24 UTC (permalink / raw) To: linux-arm-kernel Hi Per, On Tue, Apr 26, 2011 at 5:33 AM, Per Forlin <per.forlin@linaro.org> wrote: > On 23 April 2011 11:23, Linus Walleij <linus.walleij@linaro.org> wrote: >> 2011/4/22 Pavel Machek <pavel@ucw.cz>: >> >>> Plus, I was told new MMC standard has "write reliable" option... >> >> I think Per F?rlin looked into reliable write. The latest eMMC cards >> has this, but OTOMH it was too darn slow to be used on current >> chips/"cards". >> >> Per, Sebastian: any details? > I had plans to add reliable writes and do benchmarking but I never got > to it. Right now I have no plans to pick it up. > >> Reliable writes are in mmc-next already. As an improvement to that path, I have a CMD23-bounded request support patch set which is pending. Reliable writes are exposed via REQ_FUA. Keep in mind that flash cards don't have a volatile cache, so once an MMC transaction goes through the data is in flash. All reliable writes guarantee is flash state if an MMC transaction is interrupted in the middle. Additionally, the "new" reliable write (as opposed to legacy) is even less useful, since it only provides that guarantee at a sector boundary. A ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-26 20:24 ` Andrei Warkentin @ 2011-04-26 22:58 ` Jamie Lokier 2011-04-27 0:27 ` Andrei Warkentin 0 siblings, 1 reply; 38+ messages in thread From: Jamie Lokier @ 2011-04-26 22:58 UTC (permalink / raw) To: linux-arm-kernel Andrei Warkentin wrote: > Hi Per, > > On Tue, Apr 26, 2011 at 5:33 AM, Per Forlin <per.forlin@linaro.org> wrote: > > On 23 April 2011 11:23, Linus Walleij <linus.walleij@linaro.org> wrote: > >> 2011/4/22 Pavel Machek <pavel@ucw.cz>: > >> > >>> Plus, I was told new MMC standard has "write reliable" option... > >> > >> I think Per F?rlin looked into reliable write. The latest eMMC cards > >> has this, but OTOMH it was too darn slow to be used on current > >> chips/"cards". > >> > >> Per, Sebastian: any details? > > I had plans to add reliable writes and do benchmarking but I never got > > to it. Right now I have no plans to pick it up. > > > >> > > Reliable writes are in mmc-next already. As an improvement to that > path, I have a CMD23-bounded request support patch set which is > pending. > > Reliable writes are exposed via REQ_FUA. Are you sure that's appropriate? Unless I have misunderstood (very possible), REQ_FUA means writes hit non-volatile storage before acknowledgement, not that they are atomic. I think the normal users of REQ_FUA don't require or expect large atomic writes; they use it as a shortcut for (write & flush this write) without implying anything else is flushed. > Keep in mind that flash cards don't have a volatile cache, so once an > MMC transaction goes through the data is in flash. Does that not mean MMC already provides REQ_FUA semantics on every transactions? I don't know much about MMC, but the problems reported with other flash devices are either volatile cache (so may not apply to conformant MMCs), or random corruption of data that was supposed to be stored long ago, even data quite far from the locations being written at the time, because the flash is presumably reorganising itself. There are even reports of data loss resulting from power removal while reading. > All reliable writes guarantee is flash state if an MMC transaction > is interrupted in the middle. Additionally, the "new" reliable write > (as opposed to legacy) is even less useful, since it only provides > that guarantee at a sector boundary. Perhaps the sector bondary limitation makes it faster and/or limits the amount of buffer required, and/or allows the device to accept larger write transactions. Which is good if it means reliability doesn't get switched off or faked. Or perhaps it's just to align, a little, with perceived behaviour of hard disks. Hard disks don't guarantee large atomic writes as far as I know, so filesystems & databases generally don't assume it, and it's not really a problem. Some people say you can rely on a single 512-byte sector being atomically updated or not on a hard disk, but some don't; I'm siding with the latter. (SQLite has a storage flag you can set if you know the storage has that property, to tweak its commit strategy.) -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-26 22:58 ` Jamie Lokier @ 2011-04-27 0:27 ` Andrei Warkentin 2011-04-27 13:19 ` Jamie Lokier 0 siblings, 1 reply; 38+ messages in thread From: Andrei Warkentin @ 2011-04-27 0:27 UTC (permalink / raw) To: linux-arm-kernel On Tue, Apr 26, 2011 at 5:58 PM, Jamie Lokier <jamie@shareable.org> wrote: >> >> Reliable writes are exposed via REQ_FUA. > > Are you sure that's appropriate? > > Unless I have misunderstood (very possible), REQ_FUA means writes hit > non-volatile storage before acknowledgement, not that they are atomic. > I think the normal users of REQ_FUA don't require or expect large > atomic writes; they use it as a shortcut for (write & flush this > write) without implying anything else is flushed. I would agree with you that it's not the best mapping. However, a failed MMC write transaction has other properties. If I understand correctly, depending on mode of failure (say pulling power), you might wind up with extra data getting erased (because erase happens at erase unit boundary), and erase can be done before all the data was transferred from host to card. The connection I made between FUA and reliable writes, is that you were guaranteed "physical presence" of the written data on storage medium as long as the transaction went through successfully. I can see where I assumed more than I should have.... If that's not the correct interpretation I will change it. REQ_META doesn't sound like the right candidate, because it's enforcing policy. Should there be a REQ_ATOMIC request type? A ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-27 0:27 ` Andrei Warkentin @ 2011-04-27 13:19 ` Jamie Lokier 2011-04-27 13:32 ` Arnd Bergmann 0 siblings, 1 reply; 38+ messages in thread From: Jamie Lokier @ 2011-04-27 13:19 UTC (permalink / raw) To: linux-arm-kernel Andrei Warkentin wrote: > The connection I made between FUA and reliable writes, is that you > were guaranteed "physical presence" of the written data on > storage medium as long as the transaction went through successfully. I > can see where I assumed more than I should have.... If that's not the > correct interpretation I will change it. Well it would seem like a reasonable thing to *want* from the storage medium. :-) Maybe that's intended but not stated clearly in that excerpted bit of the MMC spec. With SATA, you can do FUA, or you can write non-FUA and to a FLUSH later to get the same level of durability (I presume it's the same after that) -- or you can turn off the write cache (which is ok with some disks/interfaces and wrecks the performance of others.) The fact that ordinary writes can be commited too, and you can wait for that, is quite important in practice. It's the basis for all reliable journalling and efficiently log-structured storage. > REQ_META doesn't sound like the right candidate, because it's > enforcing policy. Should there be a REQ_ATOMIC request type? Imho, only if there's a use for it. If this is about whole partitions picking up random data corruption, versus not doing so, then I suggest the choice of "Reliable Write" vs. "Unreliable Write" be a mount option or hdparm-style block device option. If there are tighter guarantees, such as "Unreliable Write" corruption being limited to the written naturally aligned 1MB blocks (say), and it was genuinely faster, that would be really valuable information to pass up to filesystems - and to userspace - as you can structure reliability around that in lots of ways. -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-27 13:19 ` Jamie Lokier @ 2011-04-27 13:32 ` Arnd Bergmann 2011-04-27 18:50 ` Peter Waechtler 0 siblings, 1 reply; 38+ messages in thread From: Arnd Bergmann @ 2011-04-27 13:32 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 27 April 2011, Jamie Lokier wrote: > Imho, only if there's a use for it. If this is about whole partitions > picking up random data corruption, versus not doing so, then I suggest > the choice of "Reliable Write" vs. "Unreliable Write" be a mount > option or hdparm-style block device option. > > If there are tighter guarantees, such as "Unreliable Write" corruption > being limited to the written naturally aligned 1MB blocks (say), and > it was genuinely faster, that would be really valuable information to > pass up to filesystems - and to userspace - as you can structure > reliability around that in lots of ways. In all the SDHC cards that I have seen, the corruption should be local to an erase block of the size that is supposedly found in /sys/block/mmcblk*/device/preferred_erase_size, which is typically 4 MB. However, I don't think that the standard actually guarantees this and, worse, some cards that I have seen actually lie about the erase block size and claim that it is 4 MB when it is actually 1.5, 2, 3 or 8 MB. For eMMC devices, I don't think we can read the erase block size. Arnd ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-27 13:32 ` Arnd Bergmann @ 2011-04-27 18:50 ` Peter Waechtler 2011-04-27 18:58 ` Andrei Warkentin 0 siblings, 1 reply; 38+ messages in thread From: Peter Waechtler @ 2011-04-27 18:50 UTC (permalink / raw) To: linux-arm-kernel Am Mittwoch, 27. April 2011, 15:32:24 schrieb Arnd Bergmann: > On Wednesday 27 April 2011, Jamie Lokier wrote: > > Imho, only if there's a use for it. If this is about whole partitions > > picking up random data corruption, versus not doing so, then I suggest > > the choice of "Reliable Write" vs. "Unreliable Write" be a mount > > option or hdparm-style block device option. > > > > If there are tighter guarantees, such as "Unreliable Write" corruption > > being limited to the written naturally aligned 1MB blocks (say), and > > it was genuinely faster, that would be really valuable information to > > pass up to filesystems - and to userspace - as you can structure > > reliability around that in lots of ways. > > In all the SDHC cards that I have seen, the corruption should be local to > an erase block of the size that is supposedly found in > /sys/block/mmcblk*/device/preferred_erase_size, which is typically 4 MB. > > However, I don't think that the standard actually guarantees this and, > worse, some cards that I have seen actually lie about the erase block > size and claim that it is 4 MB when it is actually 1.5, 2, 3 or 8 MB. > > For eMMC devices, I don't think we can read the erase block size. > I have to check, but I think to remember that it can be calculated by values provided in CSD/ ex CSD or whatever that acronym was... 4 or 8MB sounds familiar to me (and my problem). Peter ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-27 18:50 ` Peter Waechtler @ 2011-04-27 18:58 ` Andrei Warkentin 0 siblings, 0 replies; 38+ messages in thread From: Andrei Warkentin @ 2011-04-27 18:58 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 27, 2011 at 1:50 PM, Peter Waechtler <pwaechtler@mac.com> wrote: > Am Mittwoch, 27. April 2011, 15:32:24 schrieb Arnd Bergmann: >> On Wednesday 27 April 2011, Jamie Lokier wrote: >> > Imho, only if there's a use for it. ?If this is about whole partitions >> > picking up random data corruption, versus not doing so, then I suggest >> > the choice of "Reliable Write" vs. "Unreliable Write" be a mount >> > option or hdparm-style block device option. >> > >> > If there are tighter guarantees, such as "Unreliable Write" corruption >> > being limited to the written naturally aligned 1MB blocks (say), and >> > it was genuinely faster, that would be really valuable information to >> > pass up to filesystems - and to userspace - as you can structure >> > reliability around that in lots of ways. >> >> In all the SDHC cards that I have seen, the corruption should be local to >> an erase block of the size that is supposedly found in >> /sys/block/mmcblk*/device/preferred_erase_size, which is typically 4 MB. >> >> However, I don't think that the standard actually guarantees this and, >> worse, some cards that I have seen actually lie about the erase block >> size and claim that it is 4 MB when it is actually 1.5, 2, 3 or 8 MB. >> >> For eMMC devices, I don't think we can read the erase block size. >> > I have to check, but I think to remember that it can be calculated by values > provided in CSD/ ex CSD or whatever that acronym was... 4 or 8MB sounds > familiar to me (and my problem). > Yep, there are ERASE_GRP_SIZE and ERASE_GRP_MULT in the CSD, and then their high-capacity variant in EXT_CSD (HC_ERASE_GRP_SIZE). A ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-18 13:52 ` Pavel Machek 2011-04-18 17:07 ` Jamie Lokier @ 2011-04-18 19:21 ` Peter Waechtler 2011-04-18 17:24 ` Pavel Machek 2011-04-19 0:43 ` Jamie Lokier 1 sibling, 2 replies; 38+ messages in thread From: Peter Waechtler @ 2011-04-18 19:21 UTC (permalink / raw) To: linux-arm-kernel Hmh, I'm not 100% sure, but a lot of people made good experience with JFFS2. Is it that hard to get a log structured file system power fail safe? Either the end or "commit" block was written or not? ??Peter ? On 18 Apr, 2011,at 03:52 PM, Pavel Machek <pavel@ucw.cz> wrote: Hi! > > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > > > Or was the memory always mapped in sections? > > > > > > I still have to chase a potential memory corruption. The rootfs is located on > > > a SDcard and gets corrupted even when the filesystem test programs write to > > > different partitions. > > > The test scenario includes several dozen or even hundreds of warm and cold > > > boot sequences, file system write tests with sudden soft resets It's a large > > > embedded project with a lot of drivers and the fact that always the rootfs and > > > often the superblock gets damaged let me think of a memory corruption. > > > > > > > Sorry, I don't want to be obvious, but you mentioned sudden resets > > while writing, which is almost always going to wind > > up as fs corruptions, with the severity depending on the level of > > caching the system is doing to the writes. > > How are you mounting your rootfs and what file system are you using? > > What sort of corruptions to the super block are you seeing? > > If everything is implemented correctly, that depends on the type of > filesystem, block layer and storage. Some are explicitly designed to > be safe against sudden reboots and power failure - which is an > important feature of systems where removing the power is how they are > turned off at night. ...but note that no existing filesystem is safe on media such as usb sticks, SD and CF cards... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mffcuni.cz/~pavel/picture/horses/blog.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110418/bff5d2f9/attachment.html> ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-18 19:21 ` Peter Waechtler @ 2011-04-18 17:24 ` Pavel Machek 2011-04-19 0:43 ` Jamie Lokier 1 sibling, 0 replies; 38+ messages in thread From: Pavel Machek @ 2011-04-18 17:24 UTC (permalink / raw) To: linux-arm-kernel Hi! (don't top post) > I'm not 100% sure, but a lot of people made good experience with JFFS2. jffs2 needs raw flash. CF & SD look like block devices. > Is it that hard to get a log structured file system power fail safe? Not that bad if you can access raw flash. Very bad if there's layer that hides flash from you and corrupts data in between... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-18 19:21 ` Peter Waechtler 2011-04-18 17:24 ` Pavel Machek @ 2011-04-19 0:43 ` Jamie Lokier 1 sibling, 0 replies; 38+ messages in thread From: Jamie Lokier @ 2011-04-19 0:43 UTC (permalink / raw) To: linux-arm-kernel Peter Waechtler wrote: > Hmh, > > I'm not 100% sure, but a lot of people made good experience with JFFS2. > > Is it that hard to get a log structured file system power fail safe? > Either the end or "commit" block was written or not? Yes, it's very hard, maybe impossible, on the types of flash media being discussed, For a useful filesystem, it's not enough to know that a single "commit" block is written or not. The hard part is when the media writes things out of order, so at the next mount, even though you confirm the last "commit" is valid, any earlier blocks might not be valid. The only way to be sure of the last coherent state that you still have, is to read everything, and even then a valid filesystem where random files have acquired holes they didn't have before is not very useful. A purely log structured filesystem should at least look like a valid filesystem after power failure and reboot, but: - I am not sure that JFFS2 is purely log structured any more, with the compact summary information in each block about what's written elsewhere in the block. That's not reliable on media where you can't guarantee anything about the order of writes. If the summary information were ignored, it would be reliable, but slower to read. - A valid filesystem, and valid files on it are a bit different. fsync needs to work for applications to have some sanity in what they can depend on. JFFS2 is fine with that on NOR/NAND directly, but if the media doesn't guarantee order of writes... - Write disturb effects, where writing something corrupts data stored elsewhere, in other blocks that the filesystem thought was stable. - On some media, blocks far apart are less likely to disturb each other (for example some RAIDs), but not on flash translation media where the physical layout and logical layout may be completely different. - Read disturb effects, where reading triggers flash reorganisation too, and then you pull the power while the flash is writing internally to reorganise, and on the next boot something's gone missing that was stable before. All the problems are avoidable on devices designed to avoid them. Problem is devices aren't sold on the basis of these characteristics, and it's not something the next layer up can workaround reliably, though it might be able to be more robust. -- Jamie ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 19:20 ` Andrei Warkentin 2011-04-12 20:33 ` Jamie Lokier @ 2011-04-13 6:51 ` Peter Wächtler 2011-04-13 15:44 ` Nicolas Pitre 1 sibling, 1 reply; 38+ messages in thread From: Peter Wächtler @ 2011-04-13 6:51 UTC (permalink / raw) To: linux-arm-kernel Am Dienstag, 12. April 2011, 21:20:14 schrieb Andrei Warkentin: > Hi Peter, > > 2011/4/12 Peter W?chtler <pwaechtler@mac.com>: > > Hello Linux ARM developers, > > > > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > > Or was the memory always mapped in sections? > > > > I still have to chase a potential memory corruption. The rootfs is > > located on a SDcard and gets corrupted even when the filesystem test > > programs write to different partitions. > > The test scenario includes several dozen or even hundreds of warm and > > cold boot sequences, file system write tests with sudden soft resets. > > It's a large embedded project with a lot of drivers and the fact that > > always the rootfs and often the superblock gets damaged let me think of > > a memory corruption. > > Sorry, I don't want to be obvious, but you mentioned sudden resets > while writing, which is almost always going to wind > up as fs corruptions, with the severity depending on the level of > caching the system is doing to the writes. > How are you mounting your rootfs and what file system are you using? > What sort of corruptions to the super block are you seeing? > It's using ext4 with metadata journalling in ordered mode. I had to check "if it's the FS" - the test programs create lots of directories and files while a timer is armed to issue a soft reset. The partitions where the "stress tests" run on - survive it happily - just the rootfs where almost nothing gets written is severly damaged so that fsck.ext4 will not repair it automatically. I experiment with mounting rootfs as readonly, without a journal - and get hard to interpret results. The superblock gets written on a "mount -o remount,ro", the superblock of the journal gets hit and directories, inode bitmaps gets "thrashed". All on the file system where the actual tests do NOT run. In the past I tried to write patterns onto the blocks without file system to check if the sdio device mangles some blocks - no it didn't. I created a slab to move the superblock away from it's "old" memory location - the super block was not damaged anymore. Now I want to stop experimenting and want to get a nice panic where I see the root cause. But it can be still devices issuing wrong dma transfers ;( Peter ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-13 6:51 ` Peter Wächtler @ 2011-04-13 15:44 ` Nicolas Pitre 2011-04-13 18:35 ` Peter Wächtler 0 siblings, 1 reply; 38+ messages in thread From: Nicolas Pitre @ 2011-04-13 15:44 UTC (permalink / raw) To: linux-arm-kernel On Wed, 13 Apr 2011, Peter W?chtler wrote: > Am Dienstag, 12. April 2011, 21:20:14 schrieb Andrei Warkentin: > > How are you mounting your rootfs and what file system are you using? > > What sort of corruptions to the super block are you seeing? > > > > It's using ext4 with metadata journalling in ordered mode. > I had to check "if it's the FS" - the test programs create lots of directories > and files while a timer is armed to issue a soft reset. > The partitions where the "stress tests" run on - survive it happily - just the > rootfs where almost nothing gets written is severly damaged so that fsck.ext4 > will not repair it automatically. SD cards are doing their own wear leveling internally and you have no control over it. Some blocks of data may be moved around, affecting a separate logical partition, even if you are not actively writing to that partition. Now if you cut power or reset the card while this is happening you'll certainly end up with data loss. Those SD cards are made to be both cheap and fast, meaning they're certainly not reliable with regards to unexpected interruptions. Furthermore, you should have a look at this article and referenced material: http://lwn.net/Articles/428584/. Nicolas ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-13 15:44 ` Nicolas Pitre @ 2011-04-13 18:35 ` Peter Wächtler 0 siblings, 0 replies; 38+ messages in thread From: Peter Wächtler @ 2011-04-13 18:35 UTC (permalink / raw) To: linux-arm-kernel Am Mittwoch, 13. April 2011, 17:44:09 schrieb Nicolas Pitre: > On Wed, 13 Apr 2011, Peter W?chtler wrote: > > Am Dienstag, 12. April 2011, 21:20:14 schrieb Andrei Warkentin: > > > How are you mounting your rootfs and what file system are you using? > > > What sort of corruptions to the super block are you seeing? > > > > It's using ext4 with metadata journalling in ordered mode. > > I had to check "if it's the FS" - the test programs create lots of > > directories and files while a timer is armed to issue a soft reset. > > The partitions where the "stress tests" run on - survive it happily - > > just the rootfs where almost nothing gets written is severly damaged so > > that fsck.ext4 will not repair it automatically. > > SD cards are doing their own wear leveling internally and you have no > control over it. Some blocks of data may be moved around, affecting a > separate logical partition, even if you are not actively writing to that > partition. Now if you cut power or reset the card while this is > happening you'll certainly end up with data loss. Those SD cards are > made to be both cheap and fast, meaning they're certainly not reliable > with regards to unexpected interruptions. > > Furthermore, you should have a look at this article and referenced > material: http://lwn.net/Articles/428584/. > > Yes, I know this article - and I know about wear levelling etc. Data loss is not the biggest problem. The capacity is huge (>>4GB) - therefore I wouldn't want to miss journalling - but I was not involved on the decision. Ten years ago I carefully ordered the fsync, rename on ext2 on a 64MiB CompactFlash - worked well enough. The vendor knows about the requirements - perhaps learned it with another customer ;) I tried to "smash" the FS without success in the past - but the rootfs was mounted via NFS ;) And the partitions that get tortured stay intact - of course you can say that the wear levelling switches some blocks - but I don't buy it. Thanks for all the nice work on Linux - Russell included ;) Peter ^ permalink raw reply [flat|nested] 38+ messages in thread
* since when does ARM map the kernel memory in sections? 2011-04-12 18:52 since when does ARM map the kernel memory in sections? Peter Wächtler 2011-04-12 19:11 ` Colin Cross 2011-04-12 19:20 ` Andrei Warkentin @ 2011-04-12 20:15 ` Russell King - ARM Linux 2 siblings, 0 replies; 38+ messages in thread From: Russell King - ARM Linux @ 2011-04-12 20:15 UTC (permalink / raw) To: linux-arm-kernel On Tue, Apr 12, 2011 at 08:52:17PM +0200, Peter W?chtler wrote: > did the ARM Linux 2.6 kernel map the kernel memory in pages in the past? > Or was the memory always mapped in sections? Mainline kernels have never mapped memory using anything but sections. ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2011-06-06 20:38 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-12 18:52 since when does ARM map the kernel memory in sections? Peter Wächtler 2011-04-12 19:11 ` Colin Cross 2011-04-13 18:19 ` Peter Wächtler 2011-04-12 19:20 ` Andrei Warkentin 2011-04-12 20:33 ` Jamie Lokier 2011-04-13 15:27 ` Nicolas Pitre 2011-04-13 20:11 ` Jamie Lokier 2011-04-18 13:52 ` Pavel Machek 2011-04-18 17:07 ` Jamie Lokier 2011-04-18 17:17 ` Nicolas Pitre 2011-04-22 15:47 ` Pavel Machek 2011-04-23 9:23 ` Linus Walleij 2011-04-26 10:33 ` Per Forlin 2011-04-26 19:00 ` Peter Waechtler 2011-04-26 19:07 ` Jamie Lokier 2011-04-26 20:38 ` MMC and reliable write - was: " Peter Waechtler 2011-04-26 22:45 ` Jamie Lokier 2011-04-27 1:13 ` Andrei Warkentin 2011-04-27 13:07 ` Jamie Lokier 2011-04-27 19:18 ` Andrei Warkentin 2011-04-27 19:33 ` Arnd Bergmann 2011-05-03 8:04 ` Jamie Lokier 2011-06-06 10:28 ` Pavel Machek 2011-06-06 20:38 ` Peter Waechtler 2011-04-26 20:24 ` Andrei Warkentin 2011-04-26 22:58 ` Jamie Lokier 2011-04-27 0:27 ` Andrei Warkentin 2011-04-27 13:19 ` Jamie Lokier 2011-04-27 13:32 ` Arnd Bergmann 2011-04-27 18:50 ` Peter Waechtler 2011-04-27 18:58 ` Andrei Warkentin 2011-04-18 19:21 ` Peter Waechtler 2011-04-18 17:24 ` Pavel Machek 2011-04-19 0:43 ` Jamie Lokier 2011-04-13 6:51 ` Peter Wächtler 2011-04-13 15:44 ` Nicolas Pitre 2011-04-13 18:35 ` Peter Wächtler 2011-04-12 20:15 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).