* XFS journaling position
@ 2010-10-25 5:26 klonatos
2010-10-25 16:10 ` Geoffrey Wehrman
0 siblings, 1 reply; 10+ messages in thread
From: klonatos @ 2010-10-25 5:26 UTC (permalink / raw)
To: xfs
Hello,
I have recently discovered that the default journalling position of
XFS is in the "middle" AG, if the log is internal (e.g. for 32 AGs
the journal will be placed at the AG = 16). Although I am aware that
there is possible to explicitly specify a journal location using the
-l agnum=X mkfs option, I am really curious about the resoning behind
the default choice.
Thanks in advance,
Yannis Klonatos
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: XFS journaling position 2010-10-25 5:26 XFS journaling position klonatos @ 2010-10-25 16:10 ` Geoffrey Wehrman 2010-10-26 22:27 ` Michael Monnerie 0 siblings, 1 reply; 10+ messages in thread From: Geoffrey Wehrman @ 2010-10-25 16:10 UTC (permalink / raw) To: klonatos; +Cc: xfs On Mon, Oct 25, 2010 at 08:26:04AM +0300, klonatos@ics.forth.gr wrote: | I have recently discovered that the default journalling position of | XFS is in the "middle" AG, if the log is internal (e.g. for 32 AGs | the journal will be placed at the AG = 16). Although I am aware that | there is possible to explicitly specify a journal location using the | -l agnum=X mkfs option, I am really curious about the resoning behind | the default choice. I haven't found any documentation to support this, but I have been told that the middle was selected for the single disk case to minimize seek times. The distance the head must travel to or from other locations on the disk is optimized by placing the log in the middle of the disk. -- Geoffrey Wehrman SGI Building 10 Office: (651)683-5496 2750 Blue Water Road Fax: (651)683-5098 Eagan, MN 55121 E-mail: gwehrman@sgi.com http://www.sgi.com/products/storage/software/ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-25 16:10 ` Geoffrey Wehrman @ 2010-10-26 22:27 ` Michael Monnerie 2010-10-26 23:28 ` Dave Chinner 2010-10-27 14:27 ` Robert Brockway 0 siblings, 2 replies; 10+ messages in thread From: Michael Monnerie @ 2010-10-26 22:27 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 2081 bytes --] On Montag, 25. Oktober 2010 Geoffrey Wehrman wrote: > I haven't found any documentation to support this, but I have been > told that the middle was selected for the single disk case to > minimize seek times. The distance the head must travel to or from > other locations on the disk is optimized by placing the log in the > middle of the disk. This is based on the assumption that 1) a disk is totally filled with data and 2) the partition that XFS resides on is the only one on the disk. For PCs or small servers you normally have a swap partition, the recommended size is RAMx2 (at least it was once suggested, I've always only set it to RAM size). Then you have at least a root filesystem, which often won't be XFS, and then the data partition, which could be a single partition until the end of the disk. For a single 300GB disk, this means that around 20-50GB will be used by "other" (swap, root, windows?) partitions, then maybe a single 250-280GB partition for XFS. If you place the journal in the middle of the partition, you are already in an area where disks are slower than on the outside. And if you fill only half of the partition, it means your log is totally on the end of the disk, access-wise. So for the average single-disk setup, wouldn't a log at 25%-35% of the partition size be quicker than in the middle of the partition? Dave Chinner just recently said that "a partition which is 85% filled is full, from a filesystem view". That would already mean the log at 42% of the partition size would be better. (Greetings to Douglas Adams, again he was right ;-) Just my 2¢, but I don't have a single disk machine with performance needs. -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Radiointerview zum Thema Spam ****** http://www.it-podcast.at/archiv.html#podcast-100716 // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-26 22:27 ` Michael Monnerie @ 2010-10-26 23:28 ` Dave Chinner 2010-10-26 23:59 ` Michael Monnerie 2010-10-27 14:27 ` Robert Brockway 1 sibling, 1 reply; 10+ messages in thread From: Dave Chinner @ 2010-10-26 23:28 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs On Wed, Oct 27, 2010 at 12:27:26AM +0200, Michael Monnerie wrote: > On Montag, 25. Oktober 2010 Geoffrey Wehrman wrote: > > I haven't found any documentation to support this, but I have been > > told that the middle was selected for the single disk case to > > minimize seek times. The distance the head must travel to or from > > other locations on the disk is optimized by placing the log in the > > middle of the disk. > > This is based on the assumption that > 1) a disk is totally filled with data and > 2) the partition that XFS resides on is the only one on the disk. > > For PCs or small servers you normally have a swap partition, the > recommended size is RAMx2 (at least it was once suggested, I've always > only set it to RAM size). Then you have at least a root filesystem, > which often won't be XFS, and then the data partition, which could be a > single partition until the end of the disk. For a single 300GB disk, > this means that around 20-50GB will be used by "other" (swap, root, > windows?) partitions, then maybe a single 250-280GB partition for XFS. Doesn't matter - the placement of the log matters when the XFS partition is busy, not when you are swapping or your root filesystem is doing stuff. If you have traffic to multiple partitions at once, then nothing you do to the filesystem layout will make much difference. > If you place the journal in the middle of the partition, you are already > in an area where disks are slower than on the outside. And if you fill > only half of the partition, it means your log is totally on the end of > the disk, access-wise. Well, no. XFS does not fill the filesystem from the start to the end. XFS spreads the data and metadata across all the allocation groups, and so there is generally as much metađata/data on either side of the log at any given time. For example, a kernel tree on a 120GB filesystem with 4 AGs has the following distribution of extents: AG extents 0 10750 1 10730 2 9994 3 10265 So you can see that with the log at the start of AG 2, there is a pretty even chance of any specific IO in that kernel tree being either side of the log.... The difference in sequential performance from AG 0 to AG 3 on this drive is about 10MB/s, but the worst case seek time is around 18ms and average is about 9ms. IOWs, the benefit of seek time reduction by placing the log in the middle is for more than the improvement in sequential throughput would give by placing the log in AG 0. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-26 23:28 ` Dave Chinner @ 2010-10-26 23:59 ` Michael Monnerie 0 siblings, 0 replies; 10+ messages in thread From: Michael Monnerie @ 2010-10-26 23:59 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 518 bytes --] On Mittwoch, 27. Oktober 2010 Dave Chinner wrote: > XFS does not fill the filesystem from the start to the > end. Nice to know, thanks! -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Radiointerview zum Thema Spam ****** http://www.it-podcast.at/archiv.html#podcast-100716 // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-26 22:27 ` Michael Monnerie 2010-10-26 23:28 ` Dave Chinner @ 2010-10-27 14:27 ` Robert Brockway 2010-10-27 14:32 ` Robert Brockway 1 sibling, 1 reply; 10+ messages in thread From: Robert Brockway @ 2010-10-27 14:27 UTC (permalink / raw) To: xfs On Wed, 27 Oct 2010, Michael Monnerie wrote: > in an area where disks are slower than on the outside. And if you fill > only half of the partition, it means your log is totally on the end of > the disk, access-wise. Hi Michael. Ignoring LVM for a moment... My understanding is that the relationship between the physical cylinders and the logical cylinders software see is no longer necessarily linear - ie, the lower number logical cylinders aren't necessarily at the outer edge. This article talks about this: http://lissot.net/partition/mapping.html Unless you know the underlying physical cylinder layout you can't reliably position the journal in the middle cylinder on the physical media. I understand the disk drive manufacters aren't necessarily forthcoming with the necessary information although it could be established from timing tests. Cheers, Rob -- Email: robert@timetraveller.org Linux counter ID #16440 IRC: Solver (OFTC & Freenode) Web: http://www.practicalsysadmin.com Contributing member of Software in the Public Interest (http://spi-inc.org/) Open Source: The revolution that silently changed the world _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-27 14:27 ` Robert Brockway @ 2010-10-27 14:32 ` Robert Brockway 2010-10-28 9:44 ` Michael Monnerie 0 siblings, 1 reply; 10+ messages in thread From: Robert Brockway @ 2010-10-27 14:32 UTC (permalink / raw) To: xfs On Wed, 27 Oct 2010, Robert Brockway wrote: > On Wed, 27 Oct 2010, Michael Monnerie wrote: > >> in an area where disks are slower than on the outside. And if you fill >> only half of the partition, it means your log is totally on the end of >> the disk, access-wise. > > Hi Michael. Ignoring LVM for a moment... *Sigh*. I meant to go on to point out that LVM further abstracts away the hardware making it even harder to establish where the fastest part of the physical disk is. Similarly virtual hosts have little chance of trying to establish the physical nature of the device holding their filesystems. Cheers, Rob -- Email: robert@timetraveller.org Linux counter ID #16440 IRC: Solver (OFTC & Freenode) Web: http://www.practicalsysadmin.com Contributing member of Software in the Public Interest (http://spi-inc.org/) Open Source: The revolution that silently changed the world _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-27 14:32 ` Robert Brockway @ 2010-10-28 9:44 ` Michael Monnerie 2010-10-28 23:33 ` Stan Hoeppner 0 siblings, 1 reply; 10+ messages in thread From: Michael Monnerie @ 2010-10-28 9:44 UTC (permalink / raw) To: xfs; +Cc: Robert Brockway [-- Attachment #1.1: Type: Text/Plain, Size: 1113 bytes --] On Mittwoch, 27. Oktober 2010 Robert Brockway wrote: > Similarly virtual hosts have little chance of trying to establish > the physical nature of the device holding their filesystems. Yes, performance optimizations will be fun in the near future. VMs, thin provisioning, NetApps WAFL, LVM, funny disk layouts, all can do things completely different than our "old school" thinking. I wonder when there's gonna be an I/O scheduler that just elevates the I/O from a VM to the real host, so that the host itself can optimize and align. After all, a VM has no idea of the storage. That's why already now you can choose "noop" as the scheduler in a VM. I guess there will be a "virtualized" scheduler once, but we will see. -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Radiointerview zum Thema Spam ****** http://www.it-podcast.at/archiv.html#podcast-100716 // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-28 9:44 ` Michael Monnerie @ 2010-10-28 23:33 ` Stan Hoeppner 2010-10-29 7:58 ` Michael Monnerie 0 siblings, 1 reply; 10+ messages in thread From: Stan Hoeppner @ 2010-10-28 23:33 UTC (permalink / raw) To: xfs Michael Monnerie put forth on 10/28/2010 4:44 AM: > On Mittwoch, 27. Oktober 2010 Robert Brockway wrote: >> Similarly virtual hosts have little chance of trying to establish >> the physical nature of the device holding their filesystems. > > Yes, performance optimizations will be fun in the near future. VMs, thin > provisioning, NetApps WAFL, LVM, funny disk layouts, all can do things > completely different than our "old school" thinking. I wonder when > there's gonna be an I/O scheduler that just elevates the I/O from a VM > to the real host, so that the host itself can optimize and align. After > all, a VM has no idea of the storage. That's why already now you can > choose "noop" as the scheduler in a VM. I guess there will be a > "virtualized" scheduler once, but we will see. I don't see how any of this is really that different from where we already are with advanced storage systems and bare metal host OSes. We're already virtualized WRT basic SAN arrays and maybe even some PCIe RAID cards if they allow carving a RAID set into LUNs. Take for example a small FC/iSCSI SAN array controller box with 16 x 1TB SATA drives. We initialize it using the serial console, web gui, or other management tool into a single RAID 6 array with 14TB of raw space using a 256KB stripe size. We then carve this 14TB into 10 LUNs, 1.4TB each, and unmask each LUN to the FC WWN of a bare metal host running Linux. Lets assume the array controller starts at the outside edge of each disk and works its way to the inner cylinder when creating each LUN, which seems like a logical way for a vendor to implement this. We now have 10 LUNs each with progressively less performance than the one preceding it due to its location on the platters. Now, on each host we format the 1.4TB LUN with XFS. In this configuration, given that the LUNs are spread all across the platters, from outside to inside cylinder, is it really going to matter where each AG or the log is located, from a performance standpoint? The only parameters we actually know for sure here are the stripe width (14) and the stripe size (256KB). We have no knowledge of the real layout of the cylinders when we run mkfs.xfs. So as we move to a totally virtualized guest OS, we then lose the stripe width and stripe size information. How much performance does this really cost us WRT XFS filesystem layout? And considering these are VM guests, which are by design meant for consolidation, not necessarily performance, are we really losing anything at all, when looking at the big picture? How many folks are running their critical core business databases in virtual machine guests? How about core email systems? Other performance/business critical applications? -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS journaling position 2010-10-28 23:33 ` Stan Hoeppner @ 2010-10-29 7:58 ` Michael Monnerie 0 siblings, 0 replies; 10+ messages in thread From: Michael Monnerie @ 2010-10-29 7:58 UTC (permalink / raw) To: xfs; +Cc: Stan Hoeppner [-- Attachment #1.1: Type: Text/Plain, Size: 2706 bytes --] On Freitag, 29. Oktober 2010 Stan Hoeppner wrote: > So as we move to a totally virtualized guest OS, we then lose the > stripe width and stripe size information. How much performance does > this really cost us WRT XFS filesystem layout? This is still a very straightforward config. When you use a NetApp storage with virtualization, you normally use thin provisioning. So you put one VM with it's root FS on that system, and assign a data disk of 1.4TB. You then "flexclone" that VM 9 times, so you have 10 VMs running but they only use the diskspace of one. Then you upgrade or modify one VM, and only the blocks that are modified get copied on the disks. You fill 5 VMs in parallel with data, and it gets stored totally "fragmented" on the storage. Now do a snapshot of a VM, so it's old contents are frozen and every write gets written to a new place. There are companies who do a snapshot of every VM every hour, so their users can by themself recover files that they deleted/modifed wrongly. And now take into account that NetApp uses WAFL - write anywhere file layout. It means that even if you have a VM straight on disk, as soon as you modify a block it can be written in a totally different place on the storage. I'd say this is such a "total mess" where you know exactly *nothing* about the layout of anything. You cannot even optimize for stripe size. > How many folks are running their > critical core business databases in virtual machine guests? How > about core email systems? Other performance/business critical > applications? I don't know which country you are from, I'm from Austria/Europe (not the kangaroo country :-). On every single tech talk and presentation I visited this year, every single speaker talked about virtualization. Depending on who spoke, either 2009 or 2010 were the years where more virtual servers were deployed than physical ones, with a sharp increase each year. We didn't sell a single server running bare-metal OS, all had VMware or XenServer. And for business critical: SAP uses virtualization everywhere in-house, ÖBB (Austrian Railways) use Xen for virtualized central accounting program since 2001 exclusively. These are just the 2 companies who were speaking on last weeks presentation, there are far more. -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Radiointerview zum Thema Spam ****** http://www.it-podcast.at/archiv.html#podcast-100716 // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-10-29 7:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-25 5:26 XFS journaling position klonatos 2010-10-25 16:10 ` Geoffrey Wehrman 2010-10-26 22:27 ` Michael Monnerie 2010-10-26 23:28 ` Dave Chinner 2010-10-26 23:59 ` Michael Monnerie 2010-10-27 14:27 ` Robert Brockway 2010-10-27 14:32 ` Robert Brockway 2010-10-28 9:44 ` Michael Monnerie 2010-10-28 23:33 ` Stan Hoeppner 2010-10-29 7:58 ` Michael Monnerie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox