XFS journaling position

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* XFS journaling position
@ 2010-10-25  5:26 klonatos
  2010-10-25 16:10 ` Geoffrey Wehrman
  0 siblings, 1 reply; 10+ messages in thread
From: klonatos @ 2010-10-25  5:26 UTC (permalink / raw)
  To: xfs

Hello,

     I have recently discovered that the default journalling position of
XFS is in the "middle" AG, if the log is internal (e.g. for 32 AGs
the journal will be placed at the AG = 16). Although I am aware that
there is possible to explicitly specify a journal location using the
-l agnum=X mkfs option, I am really curious about the resoning behind
the default choice.

Thanks in advance,
Yannis Klonatos

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-25  5:26 XFS journaling position klonatos
@ 2010-10-25 16:10 ` Geoffrey Wehrman
  2010-10-26 22:27   ` Michael Monnerie
  0 siblings, 1 reply; 10+ messages in thread
From: Geoffrey Wehrman @ 2010-10-25 16:10 UTC (permalink / raw)
  To: klonatos; +Cc: xfs

On Mon, Oct 25, 2010 at 08:26:04AM +0300, klonatos@ics.forth.gr wrote:
|      I have recently discovered that the default journalling position of
| XFS is in the "middle" AG, if the log is internal (e.g. for 32 AGs
| the journal will be placed at the AG = 16). Although I am aware that
| there is possible to explicitly specify a journal location using the
| -l agnum=X mkfs option, I am really curious about the resoning behind
| the default choice.

I haven't found any documentation to support this, but I have been told
that the middle was selected for the single disk case to minimize seek
times.  The distance the head must travel to or from other locations on
the disk is optimized by placing the log in the middle of the disk.

-- 
Geoffrey Wehrman
SGI Building 10                             Office: (651)683-5496
2750 Blue Water Road                           Fax: (651)683-5098
Eagan, MN 55121                             E-mail: gwehrman@sgi.com
	  http://www.sgi.com/products/storage/software/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-25 16:10 ` Geoffrey Wehrman
@ 2010-10-26 22:27   ` Michael Monnerie
  2010-10-26 23:28     ` Dave Chinner
  2010-10-27 14:27     ` Robert Brockway
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Monnerie @ 2010-10-26 22:27 UTC (permalink / raw)
  To: xfs

[-- Attachment #1.1: Type: Text/Plain, Size: 2081 bytes --]

On Montag, 25. Oktober 2010 Geoffrey Wehrman wrote:
> I haven't found any documentation to support this, but I have been
> told that the middle was selected for the single disk case to
> minimize seek times.  The distance the head must travel to or from
> other locations on the disk is optimized by placing the log in the
> middle of the disk.

This is based on the assumption that
1) a disk is totally filled with data and
2) the partition that XFS resides on is the only one on the disk.

For PCs or small servers you normally have a swap partition, the 
recommended size is RAMx2 (at least it was once suggested, I've always 
only set it to RAM size). Then you have at least a root filesystem, 
which often won't be XFS, and then the data partition, which could be a 
single partition until the end of the disk. For a single 300GB disk, 
this means that around 20-50GB will be used by "other" (swap, root, 
windows?) partitions, then maybe a single 250-280GB partition for XFS. 
If you place the journal in the middle of the partition, you are already 
in an area where disks are slower than on the outside. And if you fill 
only half of the partition, it means your log is totally on the end of 
the disk, access-wise.

So for the average single-disk setup, wouldn't a log at 25%-35% of the 
partition size be quicker than in the middle of the partition? Dave 
Chinner just recently said that "a partition which is 85% filled is 
full, from a filesystem view". That would already mean the log at 42% of 
the partition size would be better. (Greetings to Douglas Adams, again 
he was right ;-)

Just my 2¢, but I don't have a single disk machine with performance 
needs.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-26 22:27   ` Michael Monnerie
@ 2010-10-26 23:28     ` Dave Chinner
  2010-10-26 23:59       ` Michael Monnerie
  2010-10-27 14:27     ` Robert Brockway
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2010-10-26 23:28 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

On Wed, Oct 27, 2010 at 12:27:26AM +0200, Michael Monnerie wrote:
> On Montag, 25. Oktober 2010 Geoffrey Wehrman wrote:
> > I haven't found any documentation to support this, but I have been
> > told that the middle was selected for the single disk case to
> > minimize seek times.  The distance the head must travel to or from
> > other locations on the disk is optimized by placing the log in the
> > middle of the disk.
> 
> This is based on the assumption that
> 1) a disk is totally filled with data and
> 2) the partition that XFS resides on is the only one on the disk.
> 
> For PCs or small servers you normally have a swap partition, the 
> recommended size is RAMx2 (at least it was once suggested, I've always 
> only set it to RAM size). Then you have at least a root filesystem, 
> which often won't be XFS, and then the data partition, which could be a 
> single partition until the end of the disk. For a single 300GB disk, 
> this means that around 20-50GB will be used by "other" (swap, root, 
> windows?) partitions, then maybe a single 250-280GB partition for XFS. 

Doesn't matter - the placement of the log matters when the XFS
partition is busy, not when you are swapping or your root filesystem
is doing stuff. If you have traffic to multiple partitions at once,
then nothing you do to the filesystem layout will make much
difference.

> If you place the journal in the middle of the partition, you are already 
> in an area where disks are slower than on the outside. And if you fill 
> only half of the partition, it means your log is totally on the end of 
> the disk, access-wise.

Well, no. XFS does not fill the filesystem from the start to the
end. XFS spreads the data and metadata across all the allocation
groups, and so there is generally as much metađata/data on either
side of the log at any given time. For example, a kernel tree on a
120GB filesystem with 4 AGs has the following distribution of
extents:

AG	extents
0	10750
1	10730
2	 9994
3	10265

So you can see that with the log at the start of AG 2, there is a
pretty even chance of any specific IO in that kernel tree being
either side of the log....

The difference in sequential performance from AG 0 to AG 3 on this
drive is about 10MB/s, but the worst case seek time is around 18ms
and average is about 9ms. IOWs, the benefit of seek time reduction
by placing the log in the middle is for more than the improvement
in sequential throughput would give by placing the log in AG 0.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-26 23:28     ` Dave Chinner
@ 2010-10-26 23:59       ` Michael Monnerie
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Monnerie @ 2010-10-26 23:59 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 518 bytes --]

On Mittwoch, 27. Oktober 2010 Dave Chinner wrote:
> XFS does not fill the filesystem from the start to the
> end.

Nice to know, thanks!

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-26 22:27   ` Michael Monnerie
  2010-10-26 23:28     ` Dave Chinner
@ 2010-10-27 14:27     ` Robert Brockway
  2010-10-27 14:32       ` Robert Brockway
  1 sibling, 1 reply; 10+ messages in thread
From: Robert Brockway @ 2010-10-27 14:27 UTC (permalink / raw)
  To: xfs

On Wed, 27 Oct 2010, Michael Monnerie wrote:

> in an area where disks are slower than on the outside. And if you fill
> only half of the partition, it means your log is totally on the end of
> the disk, access-wise.

Hi Michael.  Ignoring LVM for a moment...

My understanding is that the relationship between the physical cylinders 
and the logical cylinders software see is no longer necessarily linear - 
ie, the lower number logical cylinders aren't necessarily at the outer 
edge.

This article talks about this:

http://lissot.net/partition/mapping.html

Unless you know the underlying physical cylinder layout you can't reliably 
position the journal in the middle cylinder on the physical media.

I understand the disk drive manufacters aren't necessarily forthcoming 
with the necessary information although it could be established from 
timing tests.

Cheers,

Rob

-- 
Email: robert@timetraveller.org		Linux counter ID #16440
IRC: Solver (OFTC & Freenode)
Web: http://www.practicalsysadmin.com
Contributing member of Software in the Public Interest (http://spi-inc.org/)
Open Source: The revolution that silently changed the world

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-27 14:27     ` Robert Brockway
@ 2010-10-27 14:32       ` Robert Brockway
  2010-10-28  9:44         ` Michael Monnerie
  0 siblings, 1 reply; 10+ messages in thread
From: Robert Brockway @ 2010-10-27 14:32 UTC (permalink / raw)
  To: xfs

On Wed, 27 Oct 2010, Robert Brockway wrote:

> On Wed, 27 Oct 2010, Michael Monnerie wrote:
>
>> in an area where disks are slower than on the outside. And if you fill
>> only half of the partition, it means your log is totally on the end of
>> the disk, access-wise.
>
> Hi Michael.  Ignoring LVM for a moment...

*Sigh*.  I meant to go on to point out that LVM further abstracts away the 
hardware making it even harder to establish where the fastest part of the 
physical disk is.

Similarly virtual hosts have little chance of trying to establish the 
physical nature of the device holding their filesystems.

Cheers,

Rob

-- 
Email: robert@timetraveller.org		Linux counter ID #16440
IRC: Solver (OFTC & Freenode)
Web: http://www.practicalsysadmin.com
Contributing member of Software in the Public Interest (http://spi-inc.org/)
Open Source: The revolution that silently changed the world

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-27 14:32       ` Robert Brockway
@ 2010-10-28  9:44         ` Michael Monnerie
  2010-10-28 23:33           ` Stan Hoeppner
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Monnerie @ 2010-10-28  9:44 UTC (permalink / raw)
  To: xfs; +Cc: Robert Brockway


[-- Attachment #1.1: Type: Text/Plain, Size: 1113 bytes --]

On Mittwoch, 27. Oktober 2010 Robert Brockway wrote:
> Similarly virtual hosts have little chance of trying to establish
> the  physical nature of the device holding their filesystems.

Yes, performance optimizations will be fun in the near future. VMs, thin 
provisioning, NetApps WAFL, LVM, funny disk layouts, all can do things 
completely different than our "old school" thinking. I wonder when 
there's gonna be an I/O scheduler that just elevates the I/O from a VM 
to the real host, so that the host itself can optimize and align. After 
all, a VM has no idea of the storage. That's why already now you can 
choose "noop" as the scheduler in a VM. I guess there will be a 
"virtualized" scheduler once, but we will see.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-28  9:44         ` Michael Monnerie
@ 2010-10-28 23:33           ` Stan Hoeppner
  2010-10-29  7:58             ` Michael Monnerie
  0 siblings, 1 reply; 10+ messages in thread
From: Stan Hoeppner @ 2010-10-28 23:33 UTC (permalink / raw)
  To: xfs

Michael Monnerie put forth on 10/28/2010 4:44 AM:
> On Mittwoch, 27. Oktober 2010 Robert Brockway wrote:
>> Similarly virtual hosts have little chance of trying to establish
>> the  physical nature of the device holding their filesystems.
> 
> Yes, performance optimizations will be fun in the near future. VMs, thin 
> provisioning, NetApps WAFL, LVM, funny disk layouts, all can do things 
> completely different than our "old school" thinking. I wonder when 
> there's gonna be an I/O scheduler that just elevates the I/O from a VM 
> to the real host, so that the host itself can optimize and align. After 
> all, a VM has no idea of the storage. That's why already now you can 
> choose "noop" as the scheduler in a VM. I guess there will be a 
> "virtualized" scheduler once, but we will see.

I don't see how any of this is really that different from where we
already are with advanced storage systems and bare metal host OSes.
We're already virtualized WRT basic SAN arrays and maybe even some PCIe
RAID cards if they allow carving a RAID set into LUNs.

Take for example a small FC/iSCSI SAN array controller box with 16 x 1TB
SATA drives.  We initialize it using the serial console, web gui, or
other management tool into a single RAID 6 array with 14TB of raw space
using a 256KB stripe size.  We then carve this 14TB into 10 LUNs, 1.4TB
each, and unmask each LUN to the FC WWN of a bare metal host running
Linux.  Lets assume the array controller starts at the outside edge of
each disk and works its way to the inner cylinder when creating each
LUN, which seems like a logical way for a vendor to implement this.  We
now have 10 LUNs each with progressively less performance than the one
preceding it due to its location on the platters.

Now, on each host we format the 1.4TB LUN with XFS.  In this
configuration, given that the LUNs are spread all across the platters,
from outside to inside cylinder, is it really going to matter where each
AG or the log is located, from a performance standpoint?

The only parameters we actually know for sure here are the stripe width
(14) and the stripe size (256KB).  We have no knowledge of the real
layout of the cylinders when we run mkfs.xfs.

So as we move to a totally virtualized guest OS, we then lose the stripe
width and stripe size information.  How much performance does this
really cost us WRT XFS filesystem layout?  And considering these are VM
guests, which are by design meant for consolidation, not necessarily
performance, are we really losing anything at all, when looking at the
big picture?  How many folks are running their critical core business
databases in virtual machine guests?  How about core email systems?
Other performance/business critical applications?

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS journaling position
  2010-10-28 23:33           ` Stan Hoeppner
@ 2010-10-29  7:58             ` Michael Monnerie
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Monnerie @ 2010-10-29  7:58 UTC (permalink / raw)
  To: xfs; +Cc: Stan Hoeppner

[-- Attachment #1.1: Type: Text/Plain, Size: 2706 bytes --]

On Freitag, 29. Oktober 2010 Stan Hoeppner wrote:
> So as we move to a totally virtualized guest OS, we then lose the
> stripe width and stripe size information.  How much performance does
> this really cost us WRT XFS filesystem layout?  

This is still a very straightforward config. When you use a NetApp 
storage with virtualization, you normally use thin provisioning. So you 
put one VM with it's root FS on that system, and assign a data disk of 
1.4TB. You then "flexclone" that VM 9 times, so you have 10 VMs running 
but they only use the diskspace of one. Then you upgrade or modify one 
VM, and only the blocks that are modified get copied on the disks. 
You fill 5 VMs in parallel with data, and it gets stored totally 
"fragmented" on the storage. Now do a snapshot of a VM, so it's old 
contents are frozen and every write gets written to a new place.
There are companies who do a snapshot of every VM every hour, so their 
users can by themself recover files that they deleted/modifed wrongly.

And now take into account that NetApp uses WAFL - write anywhere file 
layout. It means that even if you have a VM straight on disk, as soon as 
you modify a block it can be written in a totally different place on the 
storage.

I'd say this is such a "total mess" where you know exactly *nothing* 
about the layout of anything. You cannot even optimize for stripe size.

> How many folks are running their
> critical core business databases in virtual machine guests?  How
> about core email systems? Other performance/business critical
> applications?

I don't know which country you are from, I'm from Austria/Europe (not 
the kangaroo country :-). On every single tech talk and presentation I 
visited this year, every single speaker talked about virtualization. 
Depending on who spoke, either 2009 or 2010 were the years where more 
virtual servers were deployed than physical ones, with a sharp increase 
each year. We didn't sell a single server running bare-metal OS, all had 
VMware or XenServer.

And for business critical: SAP uses virtualization everywhere in-house, 
ÖBB (Austrian Railways) use Xen for virtualized central accounting 
program since 2001 exclusively. These are just the 2 companies who were 
speaking on last weeks presentation, there are far more.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-10-29  7:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-25  5:26 XFS journaling position klonatos
2010-10-25 16:10 ` Geoffrey Wehrman
2010-10-26 22:27   ` Michael Monnerie
2010-10-26 23:28     ` Dave Chinner
2010-10-26 23:59       ` Michael Monnerie
2010-10-27 14:27     ` Robert Brockway
2010-10-27 14:32       ` Robert Brockway
2010-10-28  9:44         ` Michael Monnerie
2010-10-28 23:33           ` Stan Hoeppner
2010-10-29  7:58             ` Michael Monnerie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox