* XFS on 2.6.26: reading the first 4K of a large file takes ages
@ 2010-05-19 11:33 Florian Weimer
2010-05-19 11:48 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2010-05-19 11:33 UTC (permalink / raw)
To: xfs
We've got a couple of rather large files, and with a cold cache,
reading the first 4K bytes of the file (e.g., just running
"head --bytes 4096" on it) takes ages, up to several minutes,
sometimes triggering the hang check timer.
I wonder if XFS reads the whole extent information into RAM when the
file is opened. Is this the case, at least on 2.6.26? Has this
changed in later versions, perhaps?
The files in question are heavily fragmented (they have been created
with holes first, and the holes have been filled in subsequently).
I'll try to run xfs_fsr on those files, but it's going to be
tough. 8-/
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS on 2.6.26: reading the first 4K of a large file takes ages
2010-05-19 11:33 XFS on 2.6.26: reading the first 4K of a large file takes ages Florian Weimer
@ 2010-05-19 11:48 ` Christoph Hellwig
2010-05-19 23:27 ` Dave Chinner
2010-05-20 12:11 ` Florian Weimer
0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2010-05-19 11:48 UTC (permalink / raw)
To: Florian Weimer; +Cc: xfs
On Wed, May 19, 2010 at 11:33:27AM +0000, Florian Weimer wrote:
> We've got a couple of rather large files, and with a cold cache,
> reading the first 4K bytes of the file (e.g., just running
> "head --bytes 4096" on it) takes ages, up to several minutes,
> sometimes triggering the hang check timer.
>
> I wonder if XFS reads the whole extent information into RAM when the
> file is opened. Is this the case, at least on 2.6.26? Has this
> changed in later versions, perhaps?
Yes, XFS always reads in the extent map, and no this hasn't changed
recently.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS on 2.6.26: reading the first 4K of a large file takes ages
2010-05-19 11:48 ` Christoph Hellwig
@ 2010-05-19 23:27 ` Dave Chinner
2010-05-20 12:11 ` Florian Weimer
1 sibling, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2010-05-19 23:27 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs, Florian Weimer
On Wed, May 19, 2010 at 07:48:26AM -0400, Christoph Hellwig wrote:
> On Wed, May 19, 2010 at 11:33:27AM +0000, Florian Weimer wrote:
> > We've got a couple of rather large files, and with a cold cache,
> > reading the first 4K bytes of the file (e.g., just running
> > "head --bytes 4096" on it) takes ages, up to several minutes,
> > sometimes triggering the hang check timer.
> >
> > I wonder if XFS reads the whole extent information into RAM when the
> > file is opened. Is this the case, at least on 2.6.26? Has this
> > changed in later versions, perhaps?
>
> Yes, XFS always reads in the extent map, and no this hasn't changed
> recently.
And demand paging the in-memory information is hard. It's on a to-do
list somewhere 'round here....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS on 2.6.26: reading the first 4K of a large file takes ages
2010-05-19 11:48 ` Christoph Hellwig
2010-05-19 23:27 ` Dave Chinner
@ 2010-05-20 12:11 ` Florian Weimer
2010-05-21 6:20 ` Stewart Smith
1 sibling, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2010-05-20 12:11 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
* Christoph Hellwig:
> On Wed, May 19, 2010 at 11:33:27AM +0000, Florian Weimer wrote:
>> We've got a couple of rather large files, and with a cold cache,
>> reading the first 4K bytes of the file (e.g., just running
>> "head --bytes 4096" on it) takes ages, up to several minutes,
>> sometimes triggering the hang check timer.
>>
>> I wonder if XFS reads the whole extent information into RAM when the
>> file is opened. Is this the case, at least on 2.6.26? Has this
>> changed in later versions, perhaps?
>
> Yes, XFS always reads in the extent map, and no this hasn't changed
> recently.
Okay, defragmenting seems to improve things considerably. But it's
going to take a while: "extents before:5309152 after:13" *sigh*
Thanks for confirming my hunch. I don't think it's worth fixing this
in XFS. The database should call posix_fallocate() before flushing
its internal cache to the file in essentially random order, but it's
difficult to get upstream to implement this (the source code is a bit
hard to follow, unfortunately).
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS on 2.6.26: reading the first 4K of a large file takes ages
2010-05-20 12:11 ` Florian Weimer
@ 2010-05-21 6:20 ` Stewart Smith
2010-05-21 6:43 ` Florian Weimer
0 siblings, 1 reply; 7+ messages in thread
From: Stewart Smith @ 2010-05-21 6:20 UTC (permalink / raw)
To: Florian Weimer, Christoph Hellwig; +Cc: xfs
On Thu, 20 May 2010 12:11:00 +0000, Florian Weimer <fweimer@bfk.de> wrote:
> Thanks for confirming my hunch. I don't think it's worth fixing this
> in XFS. The database should call posix_fallocate() before flushing
> its internal cache to the file in essentially random order, but it's
> difficult to get upstream to implement this (the source code is a bit
> hard to follow, unfortunately).
Which database?
You could always mount with allocsize or use other tools to do the
preallocation before things got too bad.
--
Stewart Smith
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS on 2.6.26: reading the first 4K of a large file takes ages
2010-05-21 6:20 ` Stewart Smith
@ 2010-05-21 6:43 ` Florian Weimer
2010-05-21 8:26 ` Dave Chinner
0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2010-05-21 6:43 UTC (permalink / raw)
To: Stewart Smith; +Cc: Christoph Hellwig, xfs
* Stewart Smith:
> On Thu, 20 May 2010 12:11:00 +0000, Florian Weimer <fweimer@bfk.de> wrote:
>> Thanks for confirming my hunch. I don't think it's worth fixing this
>> in XFS. The database should call posix_fallocate() before flushing
>> its internal cache to the file in essentially random order, but it's
>> difficult to get upstream to implement this (the source code is a bit
>> hard to follow, unfortunately).
>
> Which database?
Oracle Berkeley DB.
> You could always mount with allocsize
This happens with "allocsize=4194304".
> or use other tools to do the preallocation before things got too
> bad.
Is there a way to transparently preallocate a few GB after the current
end of the file? That would be helpful because Berkeley DB wouldn't
have to know about it.
It's a legacy system, otherwise I would invest more effort into
putting some sort of preallocation somewhere deep into Berkeley DB.
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS on 2.6.26: reading the first 4K of a large file takes ages
2010-05-21 6:43 ` Florian Weimer
@ 2010-05-21 8:26 ` Dave Chinner
0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2010-05-21 8:26 UTC (permalink / raw)
To: Florian Weimer; +Cc: Christoph Hellwig, xfs
On Fri, May 21, 2010 at 06:43:15AM +0000, Florian Weimer wrote:
> * Stewart Smith:
>
> > On Thu, 20 May 2010 12:11:00 +0000, Florian Weimer <fweimer@bfk.de> wrote:
> >> Thanks for confirming my hunch. I don't think it's worth fixing this
> >> in XFS. The database should call posix_fallocate() before flushing
> >> its internal cache to the file in essentially random order, but it's
> >> difficult to get upstream to implement this (the source code is a bit
> >> hard to follow, unfortunately).
> >
> > Which database?
>
> Oracle Berkeley DB.
>
> > You could always mount with allocsize
>
> This happens with "allocsize=4194304".
Because allocsize only works for allocations extending the file.
> > or use other tools to do the preallocation before things got too
> > bad.
>
> Is there a way to transparently preallocate a few GB after the current
> end of the file? That would be helpful because Berkeley DB wouldn't
> have to know about it.
Yes. the fallocate() syscall has a mode that allows allocation
beyond the current end of file, as does the XFS_IOC_RESVSP ioctl.
Or, even easier, with xfs_io:
$ stat /mnt/test/foo
File: `/mnt/test/foo'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
....
$ xfs_io -f -c "resvsp 0 1048576" /mnt/test/foo
$ stat /mnt/test/foo
File: `/mnt/test/foo'
Size: 0 Blocks: 2048 IO Block: 4096 regular empty file
....
$ xfs_bmap -vp /mnt/test/foo
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..2047]: 171912..173959 0 (171912..173959) 2048 10000
$
/mnt/test/foo still a zero length file but has 1MB of extents allocated.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-05-21 8:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-19 11:33 XFS on 2.6.26: reading the first 4K of a large file takes ages Florian Weimer
2010-05-19 11:48 ` Christoph Hellwig
2010-05-19 23:27 ` Dave Chinner
2010-05-20 12:11 ` Florian Weimer
2010-05-21 6:20 ` Stewart Smith
2010-05-21 6:43 ` Florian Weimer
2010-05-21 8:26 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox