* [DISCUSS] xfs allocation bitmap method over linux raid
@ 2007-01-24 6:34 Raz Ben-Jehuda(caro)
2007-01-24 22:38 ` Nathan Scott
2007-01-24 22:58 ` David Chinner
0 siblings, 2 replies; 7+ messages in thread
From: Raz Ben-Jehuda(caro) @ 2007-01-24 6:34 UTC (permalink / raw)
To: dgc; +Cc: xfs
David Hello.
I have looked up in LKML and hopefully you are the one to ask in
regard to xfs file system in Linux.
My name is Raz and I work for a video servers company.
These servers demand high throughput from the storage.
We applied XFS file system on our machines.
A video server reads a file in a sequential manner. So, if a
file extent size is not a factor of the stripe unit size a sequential
read over a raid would break into several small pieces which
is undesirable for performance.
I have been examining the bitmap of a file over Linux raid5.
According to the documentation XFS tries to align a file on
stripe unit size.
What I have done is to fix the bitmap allocation method during
the writing to be aligned by the stripe unit size.
The thing is , though this seems to work , I do not know whether I
missed something.
The bellow is a patch (a mere two lines) i have applied to the
file system and I would be really grateful to have your opinion.
diff -ru --exclude='*.o'
/d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
--- /d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-06-18
01:49:35.000000000 +0000
+++ linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-12-26 14:11:02.000000000 +0000
@@ -441,8 +441,8 @@
if (unlikely(rt)) {
if (!(extsz = ip->i_d.di_extsize))
extsz = mp->m_sb.sb_rextsize;
- } else {
- extsz = ip->i_d.di_extsize;
+ } else {
+ extsz = mp->m_dalign; // raz fix alignment to raid stripe unit
}
isize = ip->i_d.di_size;
@@ -663,7 +663,7 @@
if (!(extsz = ip->i_d.di_extsize))
extsz = mp->m_sb.sb_rextsize;
} else {
- extsz = ip->i_d.di_extsize;
+ extsz = mp->m_dalign; // raz fix alignment to raid stripe unit
}
offset_fsb = XFS_B_TO_FSBT(mp, offset);
~
Thank you.
--
Raz
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DISCUSS] xfs allocation bitmap method over linux raid
2007-01-24 6:34 [DISCUSS] xfs allocation bitmap method over linux raid Raz Ben-Jehuda(caro)
@ 2007-01-24 22:38 ` Nathan Scott
2007-01-28 10:32 ` Raz Ben-Jehuda(caro)
2007-01-24 22:58 ` David Chinner
1 sibling, 1 reply; 7+ messages in thread
From: Nathan Scott @ 2007-01-24 22:38 UTC (permalink / raw)
To: Raz Ben-Jehuda(caro); +Cc: xfs
Hi Raz,
On Wed, 2007-01-24 at 08:34 +0200, Raz Ben-Jehuda(caro) wrote:
> David Hello.
> I have looked up in LKML and hopefully you are the one to ask in
> regard to xfs file system in Linux.
> My name is Raz and I work for a video servers company.
OOC, which one? (would be nice to put an entry for your company
on the http://oss.sgi.com/projects/xfs/users.html page).
> These servers demand high throughput from the storage.
> We applied XFS file system on our machines.
>
> A video server reads a file in a sequential manner. So, if a
Do you write the file sequentially? Buffered or direct writes?
> file extent size is not a factor of the stripe unit size a sequential
> read over a raid would break into several small pieces which
> is undesirable for performance.
>
> I have been examining the bitmap of a file over Linux raid5.
I've found that, in combination with Jens Axboe's blktrace toolkit
to be very useful - if you have a sufficiently recent kernel, I'd
highly recommend you check out blktrace, it should help you alot.
(bmap == block map, theres no bitmap involved)
> According to the documentation XFS tries to align a file on
> stripe unit size.
>
> What I have done is to fix the bitmap allocation method during
> the writing to be aligned by the stripe unit size.
Thats not quite what the patch does, FWIW - it does two things:
- forces allocations to be stripe unit sized (not aligned)
- and, er, removes some of the per-inode extsize hint code :)
> /d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
> linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
> --- /d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-06-18
> 01:49:35.000000000 +0000
> +++ linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-12-26 14:11:02.000000000 +0000
> @@ -441,8 +441,8 @@
> if (unlikely(rt)) {
> if (!(extsz = ip->i_d.di_extsize))
> extsz = mp->m_sb.sb_rextsize;
> - } else {
> - extsz = ip->i_d.di_extsize;
> + } else {
> + extsz = mp->m_dalign; // raz fix alignment to raid stripe unit
> }
The real question is, why are your initial writes not being affected by
the code in xfs_iomap_eof_align_last_fsb which rounds requests to a
stripe unit boundary? Provided you are writing sequentially, you should
be seeing xfs_iomap_eof_want_preallocate return true, then later doing
stripe unit alignment in xfs_iomap_eof_align_last_fsb (because prealloc
got set earlier) ... can you trace your requests through the routines
you've modified and find why this is _not_ happening?
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DISCUSS] xfs allocation bitmap method over linux raid
2007-01-24 6:34 [DISCUSS] xfs allocation bitmap method over linux raid Raz Ben-Jehuda(caro)
2007-01-24 22:38 ` Nathan Scott
@ 2007-01-24 22:58 ` David Chinner
1 sibling, 0 replies; 7+ messages in thread
From: David Chinner @ 2007-01-24 22:58 UTC (permalink / raw)
To: Raz Ben-Jehuda(caro); +Cc: dgc, xfs
On Wed, Jan 24, 2007 at 08:34:22AM +0200, Raz Ben-Jehuda(caro) wrote:
> David Hello.
> I have looked up in LKML and hopefully you are the one to ask in
> regard to xfs file system in Linux.
> My name is Raz and I work for a video servers company.
> These servers demand high throughput from the storage.
> We applied XFS file system on our machines.
>
> A video server reads a file in a sequential manner. So, if a
> file extent size is not a factor of the stripe unit size a sequential
> read over a raid would break into several small pieces which
> is undesirable for performance.
>
> I have been examining the bitmap of a file over Linux raid5.
> According to the documentation XFS tries to align a file on
> stripe unit size.
Yup.
> What I have done is to fix the bitmap allocation method during
> the writing to be aligned by the stripe unit size.
> The thing is , though this seems to work , I do not know whether I
> missed something.
>
> The bellow is a patch (a mere two lines) i have applied to the
> file system and I would be really grateful to have your opinion.
>
>
> diff -ru --exclude='*.o'
> /d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
> linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
> --- /d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-06-18
> 01:49:35.000000000 +0000
> +++ linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-12-26 14:11:02.000000000 +0000
> @@ -441,8 +441,8 @@
> if (unlikely(rt)) {
> if (!(extsz = ip->i_d.di_extsize))
> extsz = mp->m_sb.sb_rextsize;
> - } else {
> - extsz = ip->i_d.di_extsize;
> + } else {
> + extsz = mp->m_dalign; // raz fix alignment to raid stripe unit
> }
>
> isize = ip->i_d.di_size;
> @@ -663,7 +663,7 @@
> if (!(extsz = ip->i_d.di_extsize))
> extsz = mp->m_sb.sb_rextsize;
> } else {
> - extsz = ip->i_d.di_extsize;
> + extsz = mp->m_dalign; // raz fix alignment to raid stripe unit
> }
>
> offset_fsb = XFS_B_TO_FSBT(mp, offset);
No, that changes the default behaviour of XFS and breaks the extent
allocation size hint code, which is what you should be using to
do this. i.e:
# xfs_io -c "chattr -R +e +E" -c "extsize <sunit>" /path/to/mnt
Will set the inode extent size on all new files and directories in the
filesystem to <sunit>. You'll get a bunch of errors from this
command because you cannot change the extsize of a file that already
has extents allocated to it, so it's best to apply this right after
mkfs when the filesystem is empty.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DISCUSS] xfs allocation bitmap method over linux raid
2007-01-24 22:38 ` Nathan Scott
@ 2007-01-28 10:32 ` Raz Ben-Jehuda(caro)
2007-01-28 21:49 ` Nathan Scott
2007-01-28 23:52 ` David Chinner
0 siblings, 2 replies; 7+ messages in thread
From: Raz Ben-Jehuda(caro) @ 2007-01-28 10:32 UTC (permalink / raw)
To: nscott; +Cc: linux-xfs
first many thanks to your reply.
see bellow.
On 1/25/07, Nathan Scott <nscott@aconex.com> wrote:
> Hi Raz,
>
> On Wed, 2007-01-24 at 08:34 +0200, Raz Ben-Jehuda(caro) wrote:
> > David Hello.
> > I have looked up in LKML and hopefully you are the one to ask in
> > regard to xfs file system in Linux.
>
> OOC, which one? (would be nice to put an entry for your company
> on the http://oss.sgi.com/projects/xfs/users.html page).
>
> > These servers demand high throughput from the storage.
> > We applied XFS file system on our machines.
> >
> > A video server reads a file in a sequential manner. So, if a
>
> Do you write the file sequentially? Buffered or direct writes?
does not matter. even command like:
dd if=/dev/zero of=/d1/xxx bs=1M count=1000
will reveil extents of size modulo(stripe unit ) != 0
> > file extent size is not a factor of the stripe unit size a sequential
> > read over a raid would break into several small pieces which
> > is undesirable for performance.
> >
> > I have been examining the bitmap of a file over Linux raid5.
>
> I've found that, in combination with Jens Axboe's blktrace toolkit
> to be very useful - if you have a sufficiently recent kernel, I'd
> highly recommend you check out blktrace, it should help you alot.
>
> (bmap == block map, theres no bitmap involved)
>
> > According to the documentation XFS tries to align a file on
> > stripe unit size.
> >
> > What I have done is to fix the bitmap allocation method during
> > the writing to be aligned by the stripe unit size.
>
> Thats not quite what the patch does, FWIW - it does two things:
> - forces allocations to be stripe unit sized (not aligned)
which is what i meant.
> - and, er, removes some of the per-inode extsize hint code :)
what is it?
could my fix make any damage ?
what sort of a damage ?
> > /d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
> > linux-2.6.17-UNI/fs/xfs/xfs_iomap.c
> > --- /d1/rt/kernels/linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-06-18
> > 01:49:35.000000000 +0000
> > +++ linux-2.6.17-UNI/fs/xfs/xfs_iomap.c 2006-12-26 14:11:02.000000000 +0000
> > @@ -441,8 +441,8 @@
> > if (unlikely(rt)) {
> > if (!(extsz = ip->i_d.di_extsize))
> > extsz = mp->m_sb.sb_rextsize;
> > - } else {
> > - extsz = ip->i_d.di_extsize;
> > + } else {
> > + extsz = mp->m_dalign; // raz fix alignment to raid stripe unit
> > }
>
> The real question is, why are your initial writes not being affected by
> the code in xfs_iomap_eof_align_last_fsb which rounds requests to a
> stripe unit boundary?
I debugged xfs_iomap_write_delay:
ip->i_d.di_extsize is zero and prealloc is zero. is it correct ?
isn't it suppose stripe unit size in pages ?
Also , xfs_iomap_eof_align_last_fsb has this line :
if (io->io_flags & XFS_IOCORE_RT)
;
> Provided you are writing sequentially, you should
> be seeing xfs_iomap_eof_want_preallocate return true, then later doing
> stripe unit alignment in xfs_iomap_eof_align_last_fsb (because prealloc
> got set earlier) ... can you trace your requests through the routines
> you've modified and find why this is _not_ happening?
>
> cheers.
>
> --
> Nathan
>
>
--
Raz
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DISCUSS] xfs allocation bitmap method over linux raid
2007-01-28 10:32 ` Raz Ben-Jehuda(caro)
@ 2007-01-28 21:49 ` Nathan Scott
2007-01-29 21:49 ` Nathan Scott
2007-01-28 23:52 ` David Chinner
1 sibling, 1 reply; 7+ messages in thread
From: Nathan Scott @ 2007-01-28 21:49 UTC (permalink / raw)
To: Raz Ben-Jehuda(caro); +Cc: xfs
On Sun, 2007-01-28 at 12:32 +0200, Raz Ben-Jehuda(caro) wrote:
> > OOC, which one? (would be nice to put an entry for your company
> > on the http://oss.sgi.com/projects/xfs/users.html page).
> >
> dd if=/dev/zero of=/d1/xxx bs=1M count=1000
> will reveil extents of size modulo(stripe unit ) != 0
Does using direct IO change things (oflag=direct to dd iirc).
> > - and, er, removes some of the per-inode extsize hint code :)
> what is it?
See the "extsize" command within xfs_io(8).
> could my fix make any damage ?
> what sort of a damage ?
Not really "damage" (as in filesystem integrity), its more that it
accidentally breaks existing functionality.
> > The real question is, why are your initial writes not being affected by
> > the code in xfs_iomap_eof_align_last_fsb which rounds requests to a
> > stripe unit boundary?
>
> I debugged xfs_iomap_write_delay:
> ip->i_d.di_extsize is zero and prealloc is zero. is it correct ?
prealloc shouldn't be zero for writes that will extend the file size;
but now that I think about it, I'm not sure how it could ever get set
for a buffered write (delalloc), since by the time we come to do the
actual allocation and writes to disk, the inode size will be beyond
the allocation offset. Hmm, maybe the logic in there needs a rethink
(any thoughts there, Dave/Lachlan?)
> isn't it suppose stripe unit size in pages ?
No, extsize is not and should not be set unless its explicitly been
asked for (see the man page I refered to above).
> Also , xfs_iomap_eof_align_last_fsb has this line :
> if (io->io_flags & XFS_IOCORE_RT)
Are you using the realtime subvolume? You didn't mention that before,
so I guess you're not - in which case, the above line is not relevent
in your case.
> > Provided you are writing sequentially, you should
> > be seeing xfs_iomap_eof_want_preallocate return true, then later doing
> > stripe unit alignment in xfs_iomap_eof_align_last_fsb (because prealloc
> > got set earlier) ... can you trace your requests through the routines
> > you've modified and find why this is _not_ happening?
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DISCUSS] xfs allocation bitmap method over linux raid
2007-01-28 10:32 ` Raz Ben-Jehuda(caro)
2007-01-28 21:49 ` Nathan Scott
@ 2007-01-28 23:52 ` David Chinner
1 sibling, 0 replies; 7+ messages in thread
From: David Chinner @ 2007-01-28 23:52 UTC (permalink / raw)
To: Raz Ben-Jehuda(caro); +Cc: nscott, linux-xfs
On Sun, Jan 28, 2007 at 12:32:23PM +0200, Raz Ben-Jehuda(caro) wrote:
> first many thanks to your reply.
> see bellow.
>
> On 1/25/07, Nathan Scott <nscott@aconex.com> wrote:
> >Hi Raz,
> >
> >On Wed, 2007-01-24 at 08:34 +0200, Raz Ben-Jehuda(caro) wrote:
> >> David Hello.
> >> I have looked up in LKML and hopefully you are the one to ask in
> >> regard to xfs file system in Linux.
>
> >
> >OOC, which one? (would be nice to put an entry for your company
> >on the http://oss.sgi.com/projects/xfs/users.html page).
> >
> >> These servers demand high throughput from the storage.
> >> We applied XFS file system on our machines.
> >>
> >> A video server reads a file in a sequential manner. So, if a
> >
> >Do you write the file sequentially? Buffered or direct writes?
> does not matter. even command like:
> dd if=/dev/zero of=/d1/xxx bs=1M count=1000
> will reveil extents of size modulo(stripe unit ) != 0
Did you make the filesystem with a stripe unit set properly?
Can you post the output of 'xfs_info -n /path/to/mntpt'?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DISCUSS] xfs allocation bitmap method over linux raid
2007-01-28 21:49 ` Nathan Scott
@ 2007-01-29 21:49 ` Nathan Scott
0 siblings, 0 replies; 7+ messages in thread
From: Nathan Scott @ 2007-01-29 21:49 UTC (permalink / raw)
To: Raz Ben-Jehuda(caro); +Cc: xfs
On Mon, 2007-01-29 at 08:49 +1100, Nathan Scott wrote:
> prealloc shouldn't be zero for writes that will extend the file size;
> but now that I think about it, I'm not sure how it could ever get set
> for a buffered write (delalloc), since by the time we come to do the
> actual allocation and writes to disk, the inode size will be beyond
> the allocation offset. Hmm, maybe the logic in there needs a rethink
> (any thoughts there, Dave/Lachlan?)
I had a closer look, and remember now how this works - I was looking
in the wrong place entirely. For real (not delayed) allocations, the
stripe alignment is performed within the allocator, so deep down in
the xfs_bmapi -> xfs_bmap_alloc -> xfs_bmap_btalloc call path.
In particular, see the big comment mid-way through xfs_bmap_btalloc..
* If we are not low on available data blocks, and the
* underlying logical volume manager is a stripe, and
* the file offset is zero then try to allocate data
* blocks on stripe unit boundary.
* NOTE: ap->aeof is only set if the allocation length
* is >= the stripe unit and the allocation offset is
* at the end of file.
(the "file offset is zero" part seems misleading to me, since it
is not only aligning in that case).
So, the real answer to your "why isn't it aligning" question lies
in there - if you can instrument that code and figure out why you
aren't seeing allocation alignment adjustnments inside there, you
should be 99% of the way to understanding your problem.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-01-29 21:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-24 6:34 [DISCUSS] xfs allocation bitmap method over linux raid Raz Ben-Jehuda(caro)
2007-01-24 22:38 ` Nathan Scott
2007-01-28 10:32 ` Raz Ben-Jehuda(caro)
2007-01-28 21:49 ` Nathan Scott
2007-01-29 21:49 ` Nathan Scott
2007-01-28 23:52 ` David Chinner
2007-01-24 22:58 ` David Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox