* op-journaled fs, journal size and storage speeds @ 2011-04-30 14:51 Peter Grandi 2011-05-01 9:27 ` Dave Chinner 0 siblings, 1 reply; 6+ messages in thread From: Peter Grandi @ 2011-04-30 14:51 UTC (permalink / raw) To: Linux fs XFS, Linux fs JFS Been thinking about journals and RAID6s and SSDs. In particular for file system designs like JFS and XFS that do operation journaling (while ext[34] do block journaling). The issue is: journal size? It seems to me that adopting as guideline a percent of the filesystem is very wrong, and so I have been using a rule of thumb like one second of expected transfer rate, so "in flight" updates are never much behind. But even at a single disk *sequential* transfer rate of say 80MB/s average, a journal that contains operation records could conceivably hold dozens if not hundreds of thousands of pending metadata updates, probably targeted at very widely scattered locations on disk, and playing a journal fully could take a long time. So the idea would be that the relevant transfer rate would be the *random* one, and since that is around 4MB/s per single disk, journal sizes would end up pretty small. But many people allocate very large (at least compared to that) journals. This seems to me a fairly bad idea, because then the journal becomes a massive hot spot on the disk and draws the disk arm like black hole. I suspect that operations should not stay on the journal for a long time. However if the journal is too small processes that do metadata updates start to hang on it. So some questions for which I have guesses but not good answers: * What should journal size be proportional to? * What is the downside of a too small journal? * What is the downside of a too large journal other than space? Again I expect answers to be very different for ext[34] but I am asking for operation-journaling file system designs like JFS and XFS. BTW, another consideration is that for filesystems that are fairly journal-intensive, putting the journal on a low traffic storage device can have large benefits. But if they can be pretty small, I wonder whether putting the journals of several filesystems on the same storage device then becomes a sensible option as the locality will be quite narrow (e.g. a single physical cylinder) or it could be wortwhile like the database people do to journal to battery-backed RAM. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: op-journaled fs, journal size and storage speeds 2011-04-30 14:51 op-journaled fs, journal size and storage speeds Peter Grandi @ 2011-05-01 9:27 ` Dave Chinner 2011-05-01 18:13 ` Peter Grandi 2011-05-02 4:35 ` Stan Hoeppner 0 siblings, 2 replies; 6+ messages in thread From: Dave Chinner @ 2011-05-01 9:27 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux fs XFS, Linux fs JFS On Sat, Apr 30, 2011 at 03:51:43PM +0100, Peter Grandi wrote: > Been thinking about journals and RAID6s and SSDs. > > In particular for file system designs like JFS and XFS that do > operation journaling (while ext[34] do block journaling). XFS is not an operation journalling filesystem. Most of the metadata is dirty-region logged via buffers, just like ext3/4. Perhaps you need to read some documentation like this: http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Operation_Based_Logging > The issue is: journal size? > > It seems to me that adopting as guideline a percent of the > filesystem is very wrong, and so I have been using a rule of > thumb like one second of expected transfer rate, so "in flight" > updates are never much behind. How do you know what "one second" of "in flight" operations is going to be? I had to deal with this in XFS when implementing the delayed logging code. It uses a number of operations or a percentage of log space to determine when to checkpoint the modifications, and that is typically load dependent as to when it triggers. And then you've got the problem of concurrency - one second of a single threaded workload is much different to one second of the same workload spread across 20 CPU cores. You need to have limits that work well in both cases, and structures that scale to that level of concurrency. In reality, there's not much point in trying to calculate what one second's worth of metadata is going to be - more often that not you'll hit some other limitation in the journal subsystem, run out of memory or have to put limits in place anyway to avoid latency problems. Easiest and most reliable method seems to be to size your journal appropriatly in the first place and have you algorithms key off that.... > But even at a single disk *sequential* transfer rate of say > 80MB/s average, a journal that contains operation records could > conceivably hold dozens if not hundreds of thousands of pending > metadata updates, probably targeted at very widely scattered > locations on disk, and playing a journal fully could take a long > time. 17 minutes is my current record by crashing a VM during a chmod -R operation over a 100 million inode filesystem. That was on a ~2GB log (maximum supported size). http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Reducing_Recovery_Time > So the idea would be that the relevant transfer rate would be > the *random* one, and since that is around 4MB/s per single > disk, journal sizes would end up pretty small. But many people > allocate very large (at least compared to that) journals. > > This seems to me a fairly bad idea, because then the journal > becomes a massive hot spot on the disk and draws the disk arm > like black hole. I suspect that operations should not stay on That's why you can configure an external log.... > the journal for a long time. However if the journal is too small > processes that do metadata updates start to hang on it. Well, yes. The journal needs to be large enough to hold all the transaction reservations for the active transactions. XFS, in the worse case for a default filesystem config, needs about 100MB of log space per 300 concurrent transactions. Increasing transaction concurrency was the main reason we increased the log size... > So some questions for which I have guesses but not good answers: > > * What should journal size be proportional to? Your workload. > * What is the downside of a too small journal? Performance sucks. > * What is the downside of a too large journal other than space? Recovery times too long, lots of outstanding metadata pinned in memory (hello OOM-killer!), and other resource management related scalability issues. > Again I expect answers to be very different for ext[34] but I am > asking for operation-journaling file system designs like JFS and > XFS. > BTW, another consideration is that for filesystems that are > fairly journal-intensive, putting the journal on a low traffic > storage device can have large benefits. Yeah, nobody ever thought of an external log before.... :) > But if they can be pretty small, I wonder whether putting the > journals of several filesystems on the same storage device then > becomes a sensible option as the locality will be quite narrow > (e.g. a single physical cylinder) or it could be wortwhile like > the database people do to journal to battery-backed RAM. Got a supplier for the custom hardware you'd need? Just use a PCIe SSD.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: op-journaled fs, journal size and storage speeds 2011-05-01 9:27 ` Dave Chinner @ 2011-05-01 18:13 ` Peter Grandi 2011-05-02 1:23 ` Dave Chinner 2011-05-02 10:40 ` Christoph Hellwig 2011-05-02 4:35 ` Stan Hoeppner 1 sibling, 2 replies; 6+ messages in thread From: Peter Grandi @ 2011-05-01 18:13 UTC (permalink / raw) To: Dave Chinner; +Cc: Linux fs XFS, Linux fs JFS >> Been thinking about journals and RAID6s and SSDs. In particular >> for file system designs like JFS and XFS that do operation >> journaling (while ext[34] do block journaling). > XFS is not an operation journalling filesystem. Most of the > metadata is dirty-region logged via buffers, just like ext3/4. Looking at the sources, XFS does operations journaling, in the form of physical ("dirty region") operation logging, instead of logical operation logging like JFS. Both are very different from block journaling. More in details, to me there is a stark contrast between 'jbd.h': http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=include/linux/jbd.h;h=e06965081ba5548f74db935543af84334f58259e;hb=HEAD where I find only a few journal transaction types (blocks) and 'xfs_trans.h' where I find many journal transaction types (ops): http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=fs/xfs/xfs_trans.h;h=c2042b736b81131a780703d8a5907c848793eebb;hb=HEAD Given that in the latter I see transaction types like 'XFS_TRANS_RENAME' or 'XFS_TRANS_MKDIR' it is hard to imagine how one can argue that the XFS journals something other than ops, even if in a buffered way of sorts. Ironically comparing with the 'jfs_logmgr.h': http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=fs/jfs/jfs_logmgr.h;h=9236bc49ae7ff1aed9cad81a2b22c2c54e433ba0;hb=HEAD I see lower level transaction types there (but they are logged as ops rather than "dirty-region"s.). [ ... ] >> It seems to me that adopting as guideline a percent of the >> filesystem is very wrong, and so I have been using a rule of >> thumb like one second of expected transfer rate, so "in flight" >> updates are never much behind. > How do you know what "one second" of "in flight" operations is > going to be? Well, that's what I discuss later, it is a "rule of thumb" based on on *some* rationale, but I have been questioning it. [ ... interesting summary of some of the many issue related to journal sizing ... ] > Easiest and most reliable method seems to be to size your > journal appropriatly in the first place and have you > algorithms key off that.... Sure, but *I* am asking that question :-). [ ... ] > 17 minutes is my current record by crashing a VM during a > chmod -R operation over a 100 million inode filesystem. That > was on a ~2GB log (maximum supported size). Uhhhm I happen to strongly relate to that (on a much smaller scale :->). [ ... ] >> This seems to me a fairly bad idea, because then the journal >> becomes a massive hot spot on the disk and draws the disk arm >> like black hole. I suspect that operations should not stay on > That's why you can configure an external log.... ...and lose barriers :-). But indeed. >> the journal for a long time. However if the journal is too >> small processes that do metadata updates start to hang on it. > Well, yes. The journal needs to be large enough to hold all > the transaction reservations for the active transactions. XFS, > in the worse case for a default filesystem config, needs about > 100MB of log space per 300 concurrent transactions. [ ... ] So something like 300KB per transaction? That seems a pretty extreme worst case. How is that possible? A metadata transaction with a "dirty region" of 300KB sound enormously expensive. It may be about extent maps for a very fragmented file I guess. Also not clear here what concurrent means because the log is sequential. I'll guess that it means "in flight". [ ... ] >> * What should journal size be proportional to? > Your workload. Sure, as a very top level goal. But that's not an answer, it is handwaving. As you argue earlier, it could be proportional in some cases to IO threads; or it could be number of arms, filesystem size, size of each volume, sequential transfer rate, random transfer rate, large IO transfer rate, small IO transfer rate, ... Some tighter guideline might be better than just guessing. >> * What is the downside of a too small journal? > Performance sucks. But why? Without a journal completely performance is better; assuming a one-transaction journal this becomes slower because of writing everything twice, but that happens for any size of journal, as it is unavoidable. When the journal fills up the effect is the same as that of a 1 transaction journal. That's the same for every type of buffer. So the effect of a journal larger than 1 transaction must be felt only when the journal is not full, that is there are pauses in the flow of transactions; and then it does not matter a lot just how large the journal is. So the journal should be large enough to accomodate the highest possible rate of metadata updates for the longest time this happens until there is a pause in the metadata updates. This of course depends on workload, but some rule of thumb based on experience might help. And here my guess is that shorter journals are better than longer ones, because also: >> * What is the downside of a too large journal other than space? > Recovery times too long, lots of outstanding metadata pinned > in memory (hello OOM-killer!), and other resource management > related scalability issues. I would have expected also more seeks, as reading logged but not yet finalized metadata has to go back to the journal, but I guess that's a small effect. >> BTW, another consideration is that for filesystems that are >> fairly journal-intensive, putting the journal on a low traffic >> storage device can have large benefits. > Yeah, nobody ever thought of an external log before.... :) I was just stating the obvious here, in order to contrast it with: >> But if they can be pretty small, I wonder whether putting the >> journals of several filesystems on the same storage device then >> becomes a sensible option as the locality will be quite narrow >> (e.g. a single physical cylinder) or it could be wortwhile like >> the database people do to journal to battery-backed RAM. For example as described in this old paper: http://www.evenenterprises.com/SSDoracl.pdf > Got a supplier for the custom hardware you'd need? There are still a few, for example at different ends of the scale: http://www.ramsan.com/solutions/oracle/ http://www.microdirect.co.uk/home/product/39434/ACARD-RAM-Disk-SSD-ANS-9010B-6X-DDR-II-Slots > Just use a PCIe SSD.... Yes, that's what many people are doing, but mostly for data, rather than specifically journals. As mentioned at the start I have indeed been thinking of SSDs. But they seem to me fundamentally terrible for journals, because of the large erase blocks sizes and the enormous latency of erase operations (lots of read-erase-write cycles for small commits). They seem more oriented to large mostly read-only data sets than very small mostly write ones. The saving grace is the capacitor-backed RAM in SSDs (used to work around erase block size issues as you probably know) which to a significant extent may act as the battery-backed RAM I was mentioning; and similarly as another post says the battery-backed RAM in RAID host adapters would do much the same function. But neither as cleanly as a dedicated unit, not a cache. But as another contributor said a fast/small disk RAID1 might be quite decent in many situations. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: op-journaled fs, journal size and storage speeds 2011-05-01 18:13 ` Peter Grandi @ 2011-05-02 1:23 ` Dave Chinner 2011-05-02 10:40 ` Christoph Hellwig 1 sibling, 0 replies; 6+ messages in thread From: Dave Chinner @ 2011-05-02 1:23 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux fs XFS, Linux fs JFS On Sun, May 01, 2011 at 07:13:03PM +0100, Peter Grandi wrote: > > >> Been thinking about journals and RAID6s and SSDs. In particular > >> for file system designs like JFS and XFS that do operation > >> journaling (while ext[34] do block journaling). > > > XFS is not an operation journalling filesystem. Most of the > > metadata is dirty-region logged via buffers, just like ext3/4. > > Looking at the sources, XFS does operations journaling, in the > form of physical ("dirty region") operation logging, Operation logging contains no physical changes - it just indicates the change to be made typically via an intent/done transaction pair. It says what is going to be done, then what has been done, but not the details of the changes made. XFs _always_ logs the details of the changes made, and.... > instead of > logical operation logging like JFS. Both are very different from > block journaling. When you are dirtying entire blocks, then the way the blocks are logged is really no different to ext3/4's block logging... > More in details, to me there is a stark contrast between 'jbd.h': > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=include/linux/jbd.h;h=e06965081ba5548f74db935543af84334f58259e;hb=HEAD > > where I find only a few journal transaction types (blocks) and > 'xfs_trans.h' where I find many journal transaction types (ops): > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.38.y.git;a=blob;f=fs/xfs/xfs_trans.h;h=c2042b736b81131a780703d8a5907c848793eebb;hb=HEAD Yeah, so that number goes into the transaction header on disk mainly for debugging purposes - you can identify what operation triggered the transaction in the log just by looking at the log. However, taht is _completely ignored_ for delayed logging - you'll only ever see "checkpoint" transactions with delayed logging as it throws away all the transaction specific metadata in memory... > Given that in the latter I see transaction types like > 'XFS_TRANS_RENAME' or 'XFS_TRANS_MKDIR' it is hard to imagine how > one can argue that the XFS journals something other than ops, even > if in a buffered way of sorts. Why don't you look at the transaction reservations that define what one of those "transaction ops" contains. e.g. MKDIR uses the inode create reservation: /* * For create we can modify: * the parent directory inode: inode size * the new inode: inode size * the inode btree entry: block size * the superblock for the nlink flag: sector size * the directory btree: (max depth + v2) * dir block size * the directory inode's bmap btree: (max depth + v2) * block size * Or in the first xact we allocate some inodes giving: * the agi and agf of the ag getting the new inodes: 2 * sectorsize * the superblock for the nlink flag: sector size * the inode blocks allocated: XFS_IALLOC_BLOCKS * blocksize * the inode btree: max depth * blocksize * the allocation btrees: 2 trees * (max depth - 1) * block size */ STATIC uint xfs_calc_create_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES(mp) + MAX((mp->m_sb.sb_inodesize + mp->m_sb.sb_inodesize + mp->m_sb.sb_sectsize + XFS_FSB_TO_B(mp, 1) + XFS_DIROP_LOG_RES(mp) + 128 * (3 + XFS_DIROP_LOG_COUNT(mp))), (3 * mp->m_sb.sb_sectsize + XFS_FSB_TO_B(mp, XFS_IALLOC_BLOCKS(mp)) + XFS_FSB_TO_B(mp, mp->m_in_maxlevels) + XFS_ALLOCFREE_LOG_RES(mp, 1) + 128 * (2 + XFS_IALLOC_BLOCKS(mp) + mp->m_in_maxlevels + XFS_ALLOCFREE_LOG_COUNT(mp, 1)))); } > > How do you know what "one second" of "in flight" operations is > > going to be? > > Well, that's what I discuss later, it is a "rule of thumb" based > on on *some* rationale, but I have been questioning it. > > [ ... interesting summary of some of the many issue related to > journal sizing ... ] > > > Easiest and most reliable method seems to be to size your > > journal appropriatly in the first place and have you > > algorithms key off that.... > > Sure, but *I* am asking that question :-). And my response is that there is no one correct answer, and that physical limits are usually the issue... > >> This seems to me a fairly bad idea, because then the journal > >> becomes a massive hot spot on the disk and draws the disk arm > >> like black hole. I suspect that operations should not stay on > > > That's why you can configure an external log.... > > ...and lose barriers :-). But indeed. As always, if performance and data safety is your concern, spend a few hundred dollars more and buy a decent HW RAID card with a BBWC.... > >> the journal for a long time. However if the journal is too > >> small processes that do metadata updates start to hang on it. > > > Well, yes. The journal needs to be large enough to hold all > > the transaction reservations for the active transactions. XFS, > > in the worse case for a default filesystem config, needs about > > 100MB of log space per 300 concurrent transactions. [ ... ] > > So something like 300KB per transaction? Yup. And the size is dependent on filesystem block size, filesystem and AG size (max btree depths). So for a 64k block size filesystem, that 300kb transaction reservation blows out to about 3MB.... > That seems a pretty > extreme worst case. How is that possible? A metadata transaction > with a "dirty region" of 300KB sound enormously expensive. It may > be about extent maps for a very fragmented file I guess. It's actually very small. Have you ever looked at how much metadata a directory contains? Rule of thumb is that a directory consumes about 100MB of metadata for every million entries for average length filenames. having a create transaction consume 300KB at maximum for a worst case modification of a directory with a million, 10M or 100M entries makes that 300k look pretty small... > clear here what concurrent means because the log is sequential. > I'll guess that it means "in flight". > > [ ... ] > > >> * What should journal size be proportional to? > > > Your workload. > > Sure, as a very top level goal. But that's not an answer, it is > handwaving. As you argue earlier, it could be proportional in some > cases to IO threads; or it could be number of arms, filesystem > size, size of each volume, sequential transfer rate, random > transfer rate, large IO transfer rate, small IO transfer rate, ... Nice definition of "workload dependent". > Some tighter guideline might be better than just guessing. > > >> * What is the downside of a too small journal? > > > Performance sucks. > > But why? Without a journal completely performance is better; > assuming a one-transaction journal this becomes slower because > of writing everything twice, but that happens for any size of > journal, as it is unavoidable. Why does having a writeback cache improve perfromance? Larger journals enable longer caching of dirty metadata before writeback must occur. > When the journal fills up the effect is the same as that of a 1 > transaction journal. That's the same for every type of buffer. And then you've got the problem of having to wait for those 10 objects to complete IO before you can do another transaction, while if you have a large log, you can push on it before you run out of space to try to ensure it never stalls. And when you have 100,000 metadata objects to write back, you can optimise the IO a whole lot better than when you only have 10 objects. > So the effect of a journal larger than 1 transaction must be > felt only when the journal is not full, Sure, and we've spent years optimising the metadata flushing to ensure we empty the log as fast as possible under sustained workloads. You need enough space in the journal to decouple transactions from the flow of metadata writeback - how much is very workload dependent. > that is there are pauses > in the flow of transactions; and then it does not matter a lot > just how large the journal is. > > So the journal should be large enough to accomodate the highest > possible rate of metadata updates for the longest time this > happens until there is a pause in the metadata updates. We need to be able to sustain hundreds of thousands of transactions per second, every second, 24x7. There are no "pauses" we can take advantage of to "catch up" - metadata writeback must take place simultaneously with new transactions, and the journal must be large enough to decouple these effectively. > This of course depends on workload, but some rule of thumb based > on experience might help. Sure - we encode that experience in the mkfs and kernel default behaviour. > And here my guess is that shorter journals are better than > longer ones, because also: > > >> * What is the downside of a too large journal other than space? > > > Recovery times too long, lots of outstanding metadata pinned > > in memory (hello OOM-killer!), and other resource management > > related scalability issues. > > I would have expected also more seeks, as reading logged but not > yet finalized metadata has to go back to the journal, but I guess > that's a small effect. Say what? Nobody reads from the journal except during recovery. Anything that is in the journal is dirty in memory, so any reads come from the memory objects, not the journal.... > > Got a supplier for the custom hardware you'd need? > > There are still a few, for example at different ends of the scale: > > http://www.ramsan.com/solutions/oracle/ > http://www.microdirect.co.uk/home/product/39434/ACARD-RAM-Disk-SSD-ANS-9010B-6X-DDR-II-Slots Neither of them are what I'd consider "battery backed RAM" - to the filesystem they are simply fast block devices behind a SATA/SAS/FC interface. Effectively no different to a SAS/SATA/FC- or PCIe-based flash SSD. > But as another contributor said a fast/small disk RAID1 might be > quite decent in many situations. Not fast enough for an XFS log - I can push >500MB/s through the XFS journal on a device (12 disk (7200rpm) RAID-0) that will do 700MB/s for sequential data IO. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: op-journaled fs, journal size and storage speeds 2011-05-01 18:13 ` Peter Grandi 2011-05-02 1:23 ` Dave Chinner @ 2011-05-02 10:40 ` Christoph Hellwig 1 sibling, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2011-05-02 10:40 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux fs XFS, Linux fs JFS On Sun, May 01, 2011 at 07:13:03PM +0100, Peter Grandi wrote: > > That's why you can configure an external log.... > > ...and lose barriers :-). But indeed. Using a writeback cache on the log device is rather pointless as every writes needs write through semantics using FUA or a post-flush anyway. But I actually have patch to allow for devices with a writeback cache in external log configurations, it's just a bit complicated as we basically need to copy the pre-flush statemachine into XFS to deal with the preflush beeing for a different device than the actual write. > >> But if they can be pretty small, I wonder whether putting the > >> journals of several filesystems on the same storage device then > >> becomes a sensible option as the locality will be quite narrow > >> (e.g. a single physical cylinder) or it could be wortwhile like > >> the database people do to journal to battery-backed RAM. > > For example as described in this old paper: It only makes sense if the log activity bursts for the different filesystems happen at different times, or none of the filesystems maxes out the log IOP rate. > But they seem to me fundamentally terrible for journals, because > of the large erase blocks sizes and the enormous latency of erase > operations (lots of read-erase-write cycles for small commits). > They seem more oriented to large mostly read-only data sets than > very small mostly write ones. As mentioned earlier in this thread XFS allows to align and pad log writes. Just make sure to get a device with an erase block size <= 256 kilobytes, which usually means SLC. But even drives with a larger erase block size and sane firmware tend to be faster than plain old disks. But as Dave mentioned there's nothing that's going to beat a battery backed cache/memory for log IOP performance. > The saving grace is the capacitor-backed RAM in SSDs (used to work > around erase block size issues as you probably know) which to a > significant extent may act as the battery-backed RAM I was > mentioning; and similarly as another post says the battery-backed > RAM in RAID host adapters would do much the same function. Just make sure your device actually has it. Both the Intel X25 SSDs and many other consumer / prosumer SSDs actually don't have them and will lose data in case of a powerloss. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: op-journaled fs, journal size and storage speeds 2011-05-01 9:27 ` Dave Chinner 2011-05-01 18:13 ` Peter Grandi @ 2011-05-02 4:35 ` Stan Hoeppner 1 sibling, 0 replies; 6+ messages in thread From: Stan Hoeppner @ 2011-05-02 4:35 UTC (permalink / raw) To: xfs On 5/1/2011 4:27 AM, Dave Chinner wrote: > Got a supplier for the custom hardware you'd need? Just use a PCIe > SSD.... 50GB OCZ RevoDrive PCIe x4 SSD MLC NAND Dual SandForce 1200 controllers, internal RAID 0 design 70,000 write IOPS, 4KB aligned 350MB/s write sustained $200 USD at Newegg: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227596 Current best value for a PCIe SSD suitable for dedicated log drive use, can fit ~22 maximum size (2GB) XFS logs. Note the MLC NAND. If all your filesystems will sustain constant high rate metadata writes, an SLC based product is more suitable, though price is 10-50x higher for PCIe SLC cards. If you want/need the 10x increase in flash cell life of SLC NAND, go with this Intel SLC SATAII SSD for ~2x the $$ of the Revo. Note it's write IOPS is 'only' 33k, size is 32GB, 18GB less. http://www.newegg.com/Product/Product.aspx?Item=N82E16820167013 -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-05-02 10:36 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-30 14:51 op-journaled fs, journal size and storage speeds Peter Grandi 2011-05-01 9:27 ` Dave Chinner 2011-05-01 18:13 ` Peter Grandi 2011-05-02 1:23 ` Dave Chinner 2011-05-02 10:40 ` Christoph Hellwig 2011-05-02 4:35 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox