* Re: [Lsf-pc] [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) [not found] <CAKdFiL6CMHTHWCvSFjFNw6hzvPLDEWWAC-o3wdivq9co7SX1FA@mail.gmail.com> @ 2015-01-07 14:57 ` Sasha Levin 2015-01-10 13:40 ` Hannes Reinecke 2015-01-13 20:32 ` Adrian Palmer 1 sibling, 1 reply; 7+ messages in thread From: Sasha Levin @ 2015-01-07 14:57 UTC (permalink / raw) To: Adrian Palmer, lsf-pc; +Cc: linux-fsdevel, linux-ide, linux-scsi On 01/06/2015 06:29 PM, Adrian Palmer wrote: > I'd like to host a discussion of SMRFFS and ZAC for consumer and cloud > systems at LSF/MM. I want to gather community consensus at LSF/MM of the > required technical kernel changes before this topic is presented at Vault. > > Subtopics: > > On-disk metadata structures and data algorithms > Explicit in-order write requirement and a look at the IO stack > New IOCTLs to call from the FS and the need to know about the underlying > disk -- no longer completely disk agnostic Where can we read about the details of SMRFFS before LSF/MM / Vault? Thanks, Sasha ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) 2015-01-07 14:57 ` [Lsf-pc] [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) Sasha Levin @ 2015-01-10 13:40 ` Hannes Reinecke 0 siblings, 0 replies; 7+ messages in thread From: Hannes Reinecke @ 2015-01-10 13:40 UTC (permalink / raw) To: Sasha Levin, Adrian Palmer, lsf-pc; +Cc: linux-fsdevel, linux-ide, linux-scsi On 01/07/2015 03:57 PM, Sasha Levin wrote: > On 01/06/2015 06:29 PM, Adrian Palmer wrote: >> I'd like to host a discussion of SMRFFS and ZAC for consumer and cloud >> systems at LSF/MM. I want to gather community consensus at LSF/MM of the >> required technical kernel changes before this topic is presented at Vault. >> >> Subtopics: >> >> On-disk metadata structures and data algorithms >> Explicit in-order write requirement and a look at the IO stack >> New IOCTLs to call from the FS and the need to know about the underlying >> disk -- no longer completely disk agnostic > > Where can we read about the details of SMRFFS before LSF/MM / Vault? > Please be aware that I've been working on a ZAC prototype, and have applied for a session at LSF/MM. And my paper for Vault has just been accepted (which will be a survey of existing filesystems and their suitability for SMR drives). Can you keep me in the loop here? Maybe we should have a joint session at LSF ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) [not found] <CAKdFiL6CMHTHWCvSFjFNw6hzvPLDEWWAC-o3wdivq9co7SX1FA@mail.gmail.com> 2015-01-07 14:57 ` [Lsf-pc] [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) Sasha Levin @ 2015-01-13 20:32 ` Adrian Palmer 2015-01-13 21:50 ` Andreas Dilger 1 sibling, 1 reply; 7+ messages in thread From: Adrian Palmer @ 2015-01-13 20:32 UTC (permalink / raw) To: lsf-pc; +Cc: linux-ide, linux-scsi, linux-fsdevel This seemed to bounce on most of the lists to which it originally sent. I'm resending.. I've uploaded an introductory design document at https://github.com/Seagate/SMR_FS-EXT4. I'll update regularly. Please feel free to send questions my way. It seems there are many sub topics requested related to SMR for this conference. Adrian On Tue, Jan 6, 2015 at 4:29 PM, Adrian Palmer <adrian.palmer@seagate.com> wrote: > I agree wholeheartedly with Dr. Reinecke in discussing what is becoming my > favourite topic also. I support the need for generic filesystem support with > SMR and ZAC/ZBC drives. > > Dr. Reinecke has already proposed a discussion on the ZAC/ZBC > implementation. As a complementary topic, I want to discuss the generic > filesystem support for Host Aware (HA) / Host Managed (HM) drives. > > We at Seagate are developing an SMR Friendly File System (SMRFFS) for this > very purpose. Instead of a new filesystem with a long development time, we > are implementing it as an HA extension to EXT4 (and WILL be backwards > compatible with minimal code paths). I'll be talking about the the on-disk > changes we need to consider as well as the needed kernel changes common to > all generic filesystems. Later, we intend to evaluate the work for use in > other filesystems and kernel processes. > > I'd like to host a discussion of SMRFFS and ZAC for consumer and cloud > systems at LSF/MM. I want to gather community consensus at LSF/MM of the > required technical kernel changes before this topic is presented at Vault. > > Subtopics: > > On-disk metadata structures and data algorithms > Explicit in-order write requirement and a look at the IO stack > New IOCTLs to call from the FS and the need to know about the underlying > disk -- no longer completely disk agnostic > > > Adrian Palmer > Firmware Engineer II > R&D Firmware > Seagate, Longmont Colorado > 720-684-1307 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) 2015-01-13 20:32 ` Adrian Palmer @ 2015-01-13 21:50 ` Andreas Dilger 2015-01-13 23:26 ` Adrian Palmer 0 siblings, 1 reply; 7+ messages in thread From: Andreas Dilger @ 2015-01-13 21:50 UTC (permalink / raw) To: Adrian Palmer; +Cc: ext4 development, Linux Filesystem Development List On Jan 13, 2015, at 1:32 PM, Adrian Palmer <adrian.palmer@seagate.com> wrote: > This seemed to bounce on most of the lists to which it originally > sent. I'm resending.. > > I've uploaded an introductory design document at > https://github.com/Seagate/SMR_FS-EXT4. I'll update regularly. Please > feel free to send questions my way. > > It seems there are many sub topics requested related to SMR for this conference. I'm replying to this on the linux-ext4 list since it is mostly of interest to ext4 developers, and I'm not in control over who attends the LSF/MM conference. Also, there will be an ext4 developer meeting during/adjacent to LSF/MM that you should probably attend. I think one of the important design decisions that needs to be made early on is whether it is possible to directly access some storage that can be updated with small random writes (either a separate flash LUN on the device, or a section of the disk that is formatted for 4kB sectors without SMR write requirements). That would allow writing metadata (superblock, bitmap, group descriptor, inode table, journal, in decreasing order of importance) in random order instead of imposing possibly painful read-modify-write or COW semantics on the whole filesystem. As for the journal, I think it would be possible to handle that in a way that is very SMR friendly. It is written in linear order, and if mke2fs can size/align the journal file with SMR write regions then the only thing that needs to happen is to size/align journal transactions and the journal superblock with SMR write regions as well. I saw on your SMR_FS-EXT4 README that you are looking at 8KB sector size. Please correct my poor understanding of SMR, but isn't 8KB a lot smaller than what the actual erase block size (or chunks or whatever they are named)? I thought the erase blocks were on the order of MB in size? Are you already aware of the "bigalloc" feature? That may provide most of what you need already. It may be appropriate to default to e.g. 1MB bigalloc size for SMR drives, so that it is clear to users that the effective IO/allocation size is large for that filesystem. > On Tue, Jan 6, 2015 at 4:29 PM, Adrian Palmer <adrian.palmer@seagate.com> wrote: >> I agree wholeheartedly with Dr. Reinecke in discussing what is becoming my >> favourite topic also. I support the need for generic filesystem support with >> SMR and ZAC/ZBC drives. >> >> Dr. Reinecke has already proposed a discussion on the ZAC/ZBC >> implementation. As a complementary topic, I want to discuss the generic >> filesystem support for Host Aware (HA) / Host Managed (HM) drives. >> >> We at Seagate are developing an SMR Friendly File System (SMRFFS) for this >> very purpose. Instead of a new filesystem with a long development time, we >> are implementing it as an HA extension to EXT4 (and WILL be backwards >> compatible with minimal code paths). I'll be talking about the the on-disk >> changes we need to consider as well as the needed kernel changes common to >> all generic filesystems. Later, we intend to evaluate the work for use in >> other filesystems and kernel processes. >> >> I'd like to host a discussion of SMRFFS and ZAC for consumer and cloud >> systems at LSF/MM. I want to gather community consensus at LSF/MM of the >> required technical kernel changes before this topic is presented at Vault. >> >> Subtopics: >> >> On-disk metadata structures and data algorithms >> Explicit in-order write requirement and a look at the IO stack >> New IOCTLs to call from the FS and the need to know about the underlying >> disk -- no longer completely disk agnostic >> >> >> Adrian Palmer >> Firmware Engineer II >> R&D Firmware >> Seagate, Longmont Colorado >> 720-684-1307 > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) 2015-01-13 21:50 ` Andreas Dilger @ 2015-01-13 23:26 ` Adrian Palmer 2015-02-15 20:27 ` Alireza Haghdoost 0 siblings, 1 reply; 7+ messages in thread From: Adrian Palmer @ 2015-01-13 23:26 UTC (permalink / raw) To: Andreas Dilger; +Cc: ext4 development, Linux Filesystem Development List Andreas; Thanks. I appear to have overlooked the ext4 list for some reason (the most obvious list). On Tue, Jan 13, 2015 at 2:50 PM, Andreas Dilger <adilger@dilger.ca> wrote: > On Jan 13, 2015, at 1:32 PM, Adrian Palmer <adrian.palmer@seagate.com> wrote: >> This seemed to bounce on most of the lists to which it originally >> sent. I'm resending.. >> >> I've uploaded an introductory design document at >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Seagate_SMR-5FFS-2DEXT4&d=AwIDAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=-UHpLrTYk1Bz8YZcfJ9jdbDSs-C2VebVNvEaL0IKzDU&m=XG8Mp9OYaVpO5TJCkUXiHE6ULZ7s7CxSF2yJDzUTcZ0&s=l5X2IwR9XqKRMeuqd2qt8iumrj43h1muKUohZ6yTflM&e= . I'll update regularly. Please >> feel free to send questions my way. >> >> It seems there are many sub topics requested related to SMR for this conference. > > I'm replying to this on the linux-ext4 list since it is mostly of > interest to ext4 developers, and I'm not in control over who attends > the LSF/MM conference. Also, there will be an ext4 developer meeting > during/adjacent to LSF/MM that you should probably attend. Is this co-located, or part of LSF/MM? I would be very willing to attend if I can. > > I think one of the important design decisions that needs to be made > early on is whether it is possible to directly access some storage > that can be updated with small random writes (either a separate flash > LUN on the device, or a section of the disk that is formatted for 4kB > sectors without SMR write requirements). This would be nice, but I looking more generally to what I call 'single disk' systems. Several more complicated FSs use a separate flash drive for this purpose, but ext4 expects 1 vdev, and thus only one type of media (agnostic). We have hybrid HDD that have flash on them, but the lba space isn't separate, so the FS or the DM couldn't very easily treat them as 2 devices. Also, talk in the standards committee has resulted in the allowance of zero or more zones to be conventional PMR formatted vs SMR. The idea is the first zone on the disk. That doesn't help 1) because of the GPT table there and 2) partitions can be anywhere on the disk. This is set at manufacture time, and is not a change that can be made in the field. > > That would allow writing metadata (superblock, bitmap, group descriptor, > inode table, journal, in decreasing order of importance) in random > order instead of imposing possibly painful read-modify-write or COW > semantics on the whole filesystem. Yeah. Big design point. For backwards compatibility, 1) the superblock must reside in known locations and 2) any location change in metadata would (eventually) require the superblock to be written in place. As such the bitmaps are almost constantly updated, either in-place, or pass the in-place update through the group descriptor to the superblock. To make data more linear, I'm coding the data bitmap to mirror the Writepointer information from the disk. That would make the update of the data bitmap, while not trivial, but also much less important. For the metadata, I'm exploring the idea of putting a scratchpad in the devicemapper to hold a zone worth of data to be compacted/rewritten in place. That will require some thought. We should get to coding that in a couple of weeks. > > As for the journal, I think it would be possible to handle that in a > way that is very SMR friendly. It is written in linear order, and if > mke2fs can size/align the journal file with SMR write regions then the > only thing that needs to happen is to size/align journal transactions > and the journal superblock with SMR write regions as well. Agreed. A circular buffer would be nice, but that's in ZACv2. In the meantime, I'm looking at using 2 zones as a buffer, freeing one while using the other, both in forward only writes. I remember T'so had a proposal out for the journal. We intend to rely on that when we get to the journal (unfortunately, some time after LSF/MM) > > I saw on your SMR_FS-EXT4 README that you are looking at 8KB sector size. > Please correct my poor understanding of SMR, but isn't 8KB a lot smaller > than what the actual erase block size (or chunks or whatever they are > named)? I thought the erase blocks were on the order of MB in size? SMR doesn't use erase blocks like Flash. The idea is a zone, but I admit it is similar. They (currently) are 256MiB and nothing in the standard requires this size -- it can change or even be irregular. Current BG max size is 128MiB (4k). An 8K cluster allows for a BG to match a zone in size -- a new BG doesn't (can't) start in the middle of a zone. Also the BG/zone can be managed a single unit for purposes of file collocation/defragmentation. The ResetWritePointer command acts like an eraser, zeroing out the BG (using the same code path as discard and trim). The difference is that the FS is now aware of the state of the zone, using the information to make write decisions -- and is NOT media agnostic anymore. > > Are you already aware of the "bigalloc" feature? That may provide most > of what you need already. It may be appropriate to default to e.g. 1MB > bigalloc size for SMR drives, so that it is clear to users that the > effective IO/allocation size is large for that filesystem. We've looked at this, and found several problems. The biggest is that it is still experimental, along with it requires extents. SMR HA and HM don't like extents, as that requires a backward write. We are looking at a combination of code to scavenge from flex_bg and meta_bg to create the large BG and move the metadata around on the disk. We are finding that the developer resources required on that path are MUCH less - LSF/MM is only 2 months away. Thanks again for the questions Adrian > >> On Tue, Jan 6, 2015 at 4:29 PM, Adrian Palmer <adrian.palmer@seagate.com> wrote: >>> I agree wholeheartedly with Dr. Reinecke in discussing what is becoming my >>> favourite topic also. I support the need for generic filesystem support with >>> SMR and ZAC/ZBC drives. >>> >>> Dr. Reinecke has already proposed a discussion on the ZAC/ZBC >>> implementation. As a complementary topic, I want to discuss the generic >>> filesystem support for Host Aware (HA) / Host Managed (HM) drives. >>> >>> We at Seagate are developing an SMR Friendly File System (SMRFFS) for this >>> very purpose. Instead of a new filesystem with a long development time, we >>> are implementing it as an HA extension to EXT4 (and WILL be backwards >>> compatible with minimal code paths). I'll be talking about the the on-disk >>> changes we need to consider as well as the needed kernel changes common to >>> all generic filesystems. Later, we intend to evaluate the work for use in >>> other filesystems and kernel processes. >>> >>> I'd like to host a discussion of SMRFFS and ZAC for consumer and cloud >>> systems at LSF/MM. I want to gather community consensus at LSF/MM of the >>> required technical kernel changes before this topic is presented at Vault. >>> >>> Subtopics: >>> >>> On-disk metadata structures and data algorithms >>> Explicit in-order write requirement and a look at the IO stack >>> New IOCTLs to call from the FS and the need to know about the underlying >>> disk -- no longer completely disk agnostic >>> >>> >>> Adrian Palmer >>> Firmware Engineer II >>> R&D Firmware >>> Seagate, Longmont Colorado >>> 720-684-1307 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=AwIDAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=-UHpLrTYk1Bz8YZcfJ9jdbDSs-C2VebVNvEaL0IKzDU&m=XG8Mp9OYaVpO5TJCkUXiHE6ULZ7s7CxSF2yJDzUTcZ0&s=2qssUWsRrBGSRntKGELIrxpaWCcpJsOfz8HBZaxvegM&e= > > > Cheers, Andreas > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=AwIDAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=-UHpLrTYk1Bz8YZcfJ9jdbDSs-C2VebVNvEaL0IKzDU&m=XG8Mp9OYaVpO5TJCkUXiHE6ULZ7s7CxSF2yJDzUTcZ0&s=2qssUWsRrBGSRntKGELIrxpaWCcpJsOfz8HBZaxvegM&e= ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) 2015-01-13 23:26 ` Adrian Palmer @ 2015-02-15 20:27 ` Alireza Haghdoost 2015-02-16 5:02 ` Adrian Palmer 0 siblings, 1 reply; 7+ messages in thread From: Alireza Haghdoost @ 2015-02-15 20:27 UTC (permalink / raw) To: Adrian Palmer Cc: Andreas Dilger, ext4 development, Linux Filesystem Development List >> I think one of the important design decisions that needs to be made >> early on is whether it is possible to directly access some storage >> that can be updated with small random writes (either a separate flash >> LUN on the device, or a section of the disk that is formatted for 4kB >> sectors without SMR write requirements). > > This would be nice, but I looking more generally to what I call > 'single disk' systems. Several more complicated FSs use a separate > flash drive for this purpose, but ext4 expects 1 vdev, and thus only > one type of media (agnostic). We have hybrid HDD that have flash on > them, but the lba space isn't separate, so the FS or the DM couldn't > very easily treat them as 2 devices. > Adrian, What if vdev that has been exposed to ext4 composed out of md device instead of regular block device ? In other words, how do you see that these changes in EXT4 file system apply on software RAID array of SMR drives ? --Alireza ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) 2015-02-15 20:27 ` Alireza Haghdoost @ 2015-02-16 5:02 ` Adrian Palmer 0 siblings, 0 replies; 7+ messages in thread From: Adrian Palmer @ 2015-02-16 5:02 UTC (permalink / raw) To: Alireza Haghdoost Cc: Andreas Dilger, ext4 development, Linux Filesystem Development List That is a an issue that is on deck to explore further. The DM needs to manage each disk independently, but aggregate them and present it as 1 vdev. The trick to be figured out is in how it mixes the disks in a ZBD aware way. Stripes of 256Mib are easily handled, but impractical. Stripes of 128k are practical, but not easily handled. I see the changes that we're exploring/implementing as working on both an SMR drive and a conventional drive. ZBD does not require SMR, so the superblock and group descriptor changes should not affect conventional drives. In fact, the gd will mark the bg as a conventional zone by default, but can still use the ZBD changes (forward-write and defragmentation) without the writepointer information. EXT4 will need to be forward-write only as per SMR/ZBD. If working with a combination of drives with small stripe sizes, re-writes would work on one drive (conventional) but not the other (SMR). The bulk of the change would need to be in the DM, and will likely not bleed over to the FS. The exception I can see is that the bg size may need to increase to accommodate multiple zones on multiple SMR drives (768MiB or 1GiB BGs for RAID5). The DM would be responsible for aggregating the REPORT_ZONE data before presenting it to the FS (which would behave as normally expected). Of note, the standard requires zone size as a power of 2, so a 3-disk RAID5 may violate that on implementation. RAID0 has similar constraints, and RAID1 can operating in the same paradigm with no changes to zone information. So, in short, the DM would have to be modified to pass the aggregated zone information up to EXT4. I don't see much divergence in the proposed redesign of EXT4. Adrian Palmer Firmware Engineer II R&D Firmware Seagate, Longmont Colorado 720-684-1307 On Sun, Feb 15, 2015 at 1:27 PM, Alireza Haghdoost <haghdoost@gmail.com> wrote: >>> I think one of the important design decisions that needs to be made >>> early on is whether it is possible to directly access some storage >>> that can be updated with small random writes (either a separate flash >>> LUN on the device, or a section of the disk that is formatted for 4kB >>> sectors without SMR write requirements). >> >> This would be nice, but I looking more generally to what I call >> 'single disk' systems. Several more complicated FSs use a separate >> flash drive for this purpose, but ext4 expects 1 vdev, and thus only >> one type of media (agnostic). We have hybrid HDD that have flash on >> them, but the lba space isn't separate, so the FS or the DM couldn't >> very easily treat them as 2 devices. >> > > Adrian, > What if vdev that has been exposed to ext4 composed out of md device > instead of regular block device ? In other words, how do you see that > these changes in EXT4 file system apply on software RAID array of SMR > drives ? > > --Alireza ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-02-16 5:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAKdFiL6CMHTHWCvSFjFNw6hzvPLDEWWAC-o3wdivq9co7SX1FA@mail.gmail.com> 2015-01-07 14:57 ` [Lsf-pc] [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems) Sasha Levin 2015-01-10 13:40 ` Hannes Reinecke 2015-01-13 20:32 ` Adrian Palmer 2015-01-13 21:50 ` Andreas Dilger 2015-01-13 23:26 ` Adrian Palmer 2015-02-15 20:27 ` Alireza Haghdoost 2015-02-16 5:02 ` Adrian Palmer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).