* nfs question - ftruncate vs pwrite
@ 2005-12-07 20:46 Kenny Simpson
2005-12-07 21:14 ` Peter Staubach
0 siblings, 1 reply; 6+ messages in thread
From: Kenny Simpson @ 2005-12-07 20:46 UTC (permalink / raw)
To: linux kernel
[-- Attachment #1: Type: text/plain, Size: 706 bytes --]
Sorry about the previous partial message...
If a file is extended via ftruncate, the new empty pages are read in before the the ftruncate
returns (taking 64mS on my machine), but if the file is extended via pwrite, nothing is read in
and the system call is very quick (34uS).
Why is there such a difference? Is there another cheap way to grow a file and map in its new
pages? Am I missing some other semantic difference between ftruncate and a pwrite past the end of
the file?
Here is a test program.. compile with -DABUSE to get the pwrite version.
thanks,
-Kenny
__________________________________________
Yahoo! DSL Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 862959384-dtest.c --]
[-- Type: text/x-csrc; name="dtest.c", Size: 1379 bytes --]
#define _GNU_SOURCE
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>
int main(int argc, char* argv[])
{
int fd;
unsigned long long int const size = 4096 * 1024;
unsigned int const size_page = 1024;
unsigned long long int offset = 0;
unsigned int offset_page = 0;
//char* buffer = valloc(size);
//memset(buffer, 0, size);
if (argc != 2) {
printf("usage: %s <filename>\n", argv[0]);
return 0;
}
fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC | O_LARGEFILE /*| O_DIRECT*/, 0644);
if (fd < 0) {
perror("open");
return 0;
}
#ifdef ABUSE
pwrite64(fd, "" , 1, offset + size);
#else
ftruncate64(fd, offset + size);
#endif
char* mapping = (char*)mmap64(0, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE | MAP_NONBLOCK, fd, offset);
memset(mapping, 'a', size);
for (;;) {
offset += size;
offset_page += size_page;
#ifdef ABUSE
pwrite64(fd, "", 1, offset + size);
#else
ftruncate64(fd, offset + size);
#endif
//munmap(mapping, size);
//mapping = (char*)mmap64(0, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE | MAP_NONBLOCK, fd, offset);
remap_file_pages(mapping, size, 0, offset_page, MAP_NONBLOCK);
memset(mapping, 'a', size);
}
close(fd);
return 0;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs question - ftruncate vs pwrite
2005-12-07 20:46 nfs question - ftruncate vs pwrite Kenny Simpson
@ 2005-12-07 21:14 ` Peter Staubach
2005-12-07 21:50 ` Kenny Simpson
0 siblings, 1 reply; 6+ messages in thread
From: Peter Staubach @ 2005-12-07 21:14 UTC (permalink / raw)
To: Kenny Simpson; +Cc: linux kernel
Kenny Simpson wrote:
>Sorry about the previous partial message...
>
>If a file is extended via ftruncate, the new empty pages are read in before the the ftruncate
>returns (taking 64mS on my machine), but if the file is extended via pwrite, nothing is read in
>and the system call is very quick (34uS).
>
>Why is there such a difference? Is there another cheap way to grow a file and map in its new
>pages? Am I missing some other semantic difference between ftruncate and a pwrite past the end of
>the file?
>
You might use tcpdump or etherreal to see what the different traffic looks
like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
leads to a WRITE operation.
ps
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs question - ftruncate vs pwrite
2005-12-07 21:14 ` Peter Staubach
@ 2005-12-07 21:50 ` Kenny Simpson
2005-12-08 4:53 ` Trond Myklebust
0 siblings, 1 reply; 6+ messages in thread
From: Kenny Simpson @ 2005-12-07 21:50 UTC (permalink / raw)
To: Peter Staubach; +Cc: linux kernel
--- Peter Staubach <staubach@redhat.com> wrote:
> You might use tcpdump or etherreal to see what the different traffic looks
> like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
> leads to a WRITE operation.
Ethereal results interpreted with wild speculation:
The pwrite case:
This does a bunch of reads, but the server always returns a short read responding with EOF. It
seems that a pwrite does cause a getattr call, but that's it.
Once memory is exhausted, the pages are written out.
The ftruncate case:
This does a setattr, then does a read - this time the server responds with a large amount of
0's.
Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
extending the file... is it strictly necessary to read in pages of 0's from the server?
-Kenny
__________________________________________
Yahoo! DSL Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs question - ftruncate vs pwrite
2005-12-07 21:50 ` Kenny Simpson
@ 2005-12-08 4:53 ` Trond Myklebust
2005-12-08 5:00 ` Trond Myklebust
2005-12-08 16:15 ` Kenny Simpson
0 siblings, 2 replies; 6+ messages in thread
From: Trond Myklebust @ 2005-12-08 4:53 UTC (permalink / raw)
To: Kenny Simpson; +Cc: Peter Staubach, linux kernel
On Wed, 2005-12-07 at 13:50 -0800, Kenny Simpson wrote:
> --- Peter Staubach <staubach@redhat.com> wrote:
> > You might use tcpdump or etherreal to see what the different traffic looks
> > like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
> > leads to a WRITE operation.
>
> Ethereal results interpreted with wild speculation:
> The pwrite case:
> This does a bunch of reads, but the server always returns a short read responding with EOF. It
> seems that a pwrite does cause a getattr call, but that's it.
> Once memory is exhausted, the pages are written out.
>
> The ftruncate case:
> This does a setattr, then does a read - this time the server responds with a large amount of
> 0's.
That is as expected. The ftruncate() causes an immediate change in
length of the file on the server, and so reads will. In the case of
pwrite(), that is cached on the client until you fsync/close, and so the
server returns short reads.
> Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
> extending the file... is it strictly necessary to read in pages of 0's from the server?
Possibly not, but is this a common case that is worth optimising for?
Note that use of the standard write() syscall as opposed to mmap() will
not trigger this avalanche of page-ins.
Cheers,
Trond
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs question - ftruncate vs pwrite
2005-12-08 4:53 ` Trond Myklebust
@ 2005-12-08 5:00 ` Trond Myklebust
2005-12-08 16:15 ` Kenny Simpson
1 sibling, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2005-12-08 5:00 UTC (permalink / raw)
To: Kenny Simpson; +Cc: Peter Staubach, linux kernel
On Wed, 2005-12-07 at 23:53 -0500, Trond Myklebust wrote:
> On Wed, 2005-12-07 at 13:50 -0800, Kenny Simpson wrote:
> > --- Peter Staubach <staubach@redhat.com> wrote:
> > > You might use tcpdump or etherreal to see what the different traffic looks
> > > like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
> > > leads to a WRITE operation.
> >
> > Ethereal results interpreted with wild speculation:
> > The pwrite case:
> > This does a bunch of reads, but the server always returns a short read responding with EOF. It
> > seems that a pwrite does cause a getattr call, but that's it.
> > Once memory is exhausted, the pages are written out.
> >
> > The ftruncate case:
> > This does a setattr, then does a read - this time the server responds with a large amount of
> > 0's.
>
> That is as expected. The ftruncate() causes an immediate change in
> length of the file on the server, and so reads will.
...Err...
...and so reads of the empty pages will succeed.
> In the case of
> pwrite(), that is cached on the client until you fsync/close, and so the
> server returns short reads.
>
> > Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
> > extending the file... is it strictly necessary to read in pages of 0's from the server?
>
> Possibly not, but is this a common case that is worth optimising for?
> Note that use of the standard write() syscall as opposed to mmap() will
> not trigger this avalanche of page-ins.
>
> Cheers,
> Trond
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs question - ftruncate vs pwrite
2005-12-08 4:53 ` Trond Myklebust
2005-12-08 5:00 ` Trond Myklebust
@ 2005-12-08 16:15 ` Kenny Simpson
1 sibling, 0 replies; 6+ messages in thread
From: Kenny Simpson @ 2005-12-08 16:15 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Peter Staubach, linux kernel
--- Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> That is as expected. The ftruncate() causes an immediate change in
> length of the file on the server, and so reads will. In the case of
> pwrite(), that is cached on the client until you fsync/close, and so the
> server returns short reads.
>
> > Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
> > extending the file... is it strictly necessary to read in pages of 0's from the server?
>
> Possibly not, but is this a common case that is worth optimising for?
I am attempting to write a low-latency logger. 'Low' meaning a system call is too slow (measured
at 0.3 microseconds) for each message. So I am trying to use the page cache to handle the
background scheduling of bulk writes to the server, and as an extra layer of reliability in the
event of a program crash. The use of pwrite seems to be the best option at this time as spending
a few milliseconds for an ftruncate to a show-stopper.
I could also just write locally into a shared memory region, and have my own background copy to
the server, but this seems a bit wasteful when the page cache does most of this already, and can
optimize page-sized writes.
-Kenny
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-12-08 16:15 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-07 20:46 nfs question - ftruncate vs pwrite Kenny Simpson
2005-12-07 21:14 ` Peter Staubach
2005-12-07 21:50 ` Kenny Simpson
2005-12-08 4:53 ` Trond Myklebust
2005-12-08 5:00 ` Trond Myklebust
2005-12-08 16:15 ` Kenny Simpson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox