public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Stewart Smith <stewart@mysql.com>
To: Sam Vaughan <sjv@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS_IOC_RESVSP64 versus XFS_IOC_ALLOCSP64 with multiple threads
Date: Mon, 13 Nov 2006 16:20:50 +1100	[thread overview]
Message-ID: <1163395250.14517.38.camel@localhost.localdomain> (raw)
In-Reply-To: <12275452-56ED-4921-899F-EFF1C05B251A@sgi.com>

[-- Attachment #1: Type: text/plain, Size: 4850 bytes --]

On Mon, 2006-11-13 at 15:53 +1100, Sam Vaughan wrote:
> Just to be clear, are we talking about intra-file fragmentation, i.e.  
> file data laid out discontiguously on disk, or inter-file  
> fragmentation where each file is continguous on disk but the files  
> from different processes are getting interleaved?  Also, are there  
> just a couple of user data files, each of them potentially much  
> larger than the size of an AG, or do you split the data up into many  
> files, e.g. datafile01.dat ... datafile99.dat ...?

an example:

/home/mysql/cluster/ndb_1_fs/datafile1.dat:
 EXT: FILE-OFFSET       BLOCK-RANGE        AG AG-OFFSET          TOTAL
   0: [0..63]:          32862376..32862439  8 (1405096..1405159)    64
   1: [64..127]:        32875992..32876055  8 (1418712..1418775)    64
   2: [128..191]:       33040112..33040175  8 (1582832..1582895)    64
   3: [192..255]:       33080136..33080199  8 (1622856..1622919)    64
   4: [256..319]:       33101416..33101479  8 (1644136..1644199)    64
   5: [320..383]:       33112624..33112687  8 (1655344..1655407)    64
   6: [384..447]:       32526608..32526671  8 (1069328..1069391)    64
   7: [448..511]:       31678920..31678983  8 (221640..221703)      64
/home/mysql/cluster/ndb_2_fs/datafile1.dat:
 EXT: FILE-OFFSET       BLOCK-RANGE        AG AG-OFFSET          TOTAL
   0: [0..63]:          32864704..32864767  8 (1407424..1407487)    64
   1: [64..127]:        32888544..32888607  8 (1431264..1431327)    64
   2: [128..191]:       33068832..33068895  8 (1611552..1611615)    64
   3: [192..255]:       33101168..33101231  8 (1643888..1643951)    64
   4: [256..319]:       33101656..33101719  8 (1644376..1644439)    64
   5: [320..383]:       33115784..33115847  8 (1658504..1658567)    64
   6: [384..447]:       33897200..33897263  8 (2439920..2439983)    64
   7: [448..511]:       33900896..33900959  8 (2443616..2443679)    64

on this fs:
 isize=256    agcount=32, agsize=491520 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=15728640,
imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=3840, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

(somewhere between 5-15Gb free from this create IIRC)

these datafiles are fixed size, allocated by user. a DBA would run from
the SQL server something like:
CREATE TABLESPACE ts1
ADD DATAFILE 'datafile.dat'
USE LOGFILE GROUP lg1
INITIAL_SIZE 1G
ENGINE NDB;

to get a tablespace with 1GB data file (on each node).

we currently don't do any automatic extending.

> If you have the flexibility to break the data up at arbitrary points  
> into separate files, you could get optimal allocation behaviour by  
> starting a new directory as soon as the files in the current one are  
> large enough to fill an AG.  The problem with the filestreams  
> allocator is that it will only dedicate an AG to a directory for a  
> fixed and short period of time after the last file was written to  
> it.  This works well to limit the resource drain on AGs when running  
> file-per-frame video captures, but not so well with a database that  
> writes its data in a far less regimented and timely way.

for the data and undo files, we're just not changing their size except
at creation time, so that's okay.

> Now in your case you're using different directories, so your files  
> are probably OK at the start of day.  Once the AGs they start in fill  
> up though, the files for both processes will start getting allocated  
> from the next available AG.  At that point, allocations that started  
> out looking like the first test above will end up looking like the  
> second.
> 
> The filestreams allocator will stop this from happening for  
> applications that write data regularly like video ingest servers, but  
> I wouldn't expect it to be a cure-all for your database app because  
> your writes could have large delays between them.  Instead, I'd look  
> into ways to break up your data into AG-sized chunks, starting a new  
> directory every time you go over that magic size.

I'll have to check our writing behaviour the files that change sizes...
but they're not too much of an issue (they're hardly ever read back, so
as long as writing them out is okay and reading isn't totally abismal,
we don't have to worry).
-- 
Stewart Smith, Software Engineer
MySQL AB, www.mysql.com
Office: +14082136540 Ext: 6616
VoIP: 6616@sip.us.mysql.com
Mobile: +61 4 3 8844 332

Jumpstart your cluster:
http://www.mysql.com/consulting/packaged/cluster.html

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2006-11-13  5:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-13  1:33 XFS_IOC_RESVSP64 versus XFS_IOC_ALLOCSP64 with multiple threads Stewart Smith
     [not found] ` <965ECEF2-971D-46A1-B3F2-C6C1860C9ED8@sgi.com>
2006-11-13  4:09   ` Stewart Smith
2006-11-13  4:53     ` Sam Vaughan
2006-11-13  5:20       ` Stewart Smith [this message]
2006-11-14  0:04         ` Sam Vaughan
2006-11-14  0:25           ` Chris Wedgwood
2006-11-14  0:31             ` Sam Vaughan
2006-11-14  0:37               ` Sam Vaughan
2006-11-27  5:55           ` Stewart Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1163395250.14517.38.camel@localhost.localdomain \
    --to=stewart@mysql.com \
    --cc=sjv@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox