All of lore.kernel.org
 help / color / mirror / Atom feed
* size of nfsv4 writes
@ 2008-06-04 16:40 Olga Kornievskaia
  2008-06-04 16:46 ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Olga Kornievskaia @ 2008-06-04 16:40 UTC (permalink / raw)
  To: Trond Myklebust, Chuck Lever, linux-nfs


While testing NFSv4 performance over the 10GE network, we are seeing the 
following behavior and would like to know if it is normal or a bug in 
the client code.

The server offers the max_write of 1M. The client mounts the server with 
the "wsize" option of 1M. Yet during the write we are seeing that the 
write size is at most 49K. Why does client never come close to 1M limit?

Thanks.

-Olga

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size of nfsv4 writes
  2008-06-04 16:40 size of nfsv4 writes Olga Kornievskaia
@ 2008-06-04 16:46 ` Trond Myklebust
  2008-06-04 22:04   ` Olga Kornievskaia
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2008-06-04 16:46 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Chuck Lever, linux-nfs

On Wed, 2008-06-04 at 12:40 -0400, Olga Kornievskaia wrote:
> While testing NFSv4 performance over the 10GE network, we are seeing the 
> following behavior and would like to know if it is normal or a bug in 
> the client code.
> 
> The server offers the max_write of 1M. The client mounts the server with 
> the "wsize" option of 1M. Yet during the write we are seeing that the 
> write size is at most 49K. Why does client never come close to 1M limit?

I have a feeling that is due to some crap in the VM. I'm currently
investigating a situation where it appears we're sending 1 COMMIT for
every 1-5 32k WRITEs. This is not a policy that stems from the NFS
client, so it would appear that the VM is being silly about things.

I'm specially suspicious of the code in get_dirty_limits() that is
setting a limit to the number of dirty pages based on the number of
pages a given BDI has written out in the recent past. As far as I can
see, the intention is to penalise devices that are slow writers, but in
practice it doesn't do that: it penalises the devices that have the
least activity.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size of nfsv4 writes
  2008-06-04 16:46 ` Trond Myklebust
@ 2008-06-04 22:04   ` Olga Kornievskaia
  2008-06-06  0:25     ` Dean Hildebrand
  2008-06-12 21:41     ` Olga Kornievskaia
  0 siblings, 2 replies; 8+ messages in thread
From: Olga Kornievskaia @ 2008-06-04 22:04 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, linux-nfs



Trond Myklebust wrote:
> On Wed, 2008-06-04 at 12:40 -0400, Olga Kornievskaia wrote:
>   
>> While testing NFSv4 performance over the 10GE network, we are seeing the 
>> following behavior and would like to know if it is normal or a bug in 
>> the client code.
>>
>> The server offers the max_write of 1M. The client mounts the server with 
>> the "wsize" option of 1M. Yet during the write we are seeing that the 
>> write size is at most 49K. Why does client never come close to 1M limit?
>>     
>
> I have a feeling that is due to some crap in the VM. I'm currently
> investigating a situation where it appears we're sending 1 COMMIT for
> every 1-5 32k WRITEs. This is not a policy that stems from the NFS
> client, so it would appear that the VM is being silly about things.
>
> I'm specially suspicious of the code in get_dirty_limits() that is
> setting a limit to the number of dirty pages based on the number of
> pages a given BDI has written out in the recent past. As far as I can
> see, the intention is to penalise devices that are slow writers, but in
> practice it doesn't do that: it penalises the devices that have the
> least activity.
>
>   
I think we are seeing larger than usual number of COMMIT messages.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size of nfsv4 writes
  2008-06-04 22:04   ` Olga Kornievskaia
@ 2008-06-06  0:25     ` Dean Hildebrand
  2008-06-12 21:41     ` Olga Kornievskaia
  1 sibling, 0 replies; 8+ messages in thread
From: Dean Hildebrand @ 2008-06-06  0:25 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, Chuck Lever, linux-nfs



Olga Kornievskaia wrote:
>
>
> Trond Myklebust wrote:
>> On Wed, 2008-06-04 at 12:40 -0400, Olga Kornievskaia wrote:
>>  
>>> While testing NFSv4 performance over the 10GE network, we are seeing 
>>> the following behavior and would like to know if it is normal or a 
>>> bug in the client code.
>>>
>>> The server offers the max_write of 1M. The client mounts the server 
>>> with the "wsize" option of 1M. Yet during the write we are seeing 
>>> that the write size is at most 49K. Why does client never come close 
>>> to 1M limit?
Does /proc/mounts indicate 1M?  As a total guess, could there be 
something going on in nfs_can_coalesce_requests 
</lxr-pnfs/ident?i=nfs_can_coalesce_requests>?  (I can't imagine why, 
but we had a pnfs problem where our additions to 
nfs_can_coalesce_requests were causing similar behavior)
>>>     
>>
>> I have a feeling that is due to some crap in the VM. I'm currently
>> investigating a situation where it appears we're sending 1 COMMIT for
>> every 1-5 32k WRITEs. This is not a policy that stems from the NFS
>> client, so it would appear that the VM is being silly about things.
>>
>> I'm specially suspicious of the code in get_dirty_limits() that is
>> setting a limit to the number of dirty pages based on the number of
>> pages a given BDI has written out in the recent past. As far as I can
>> see, the intention is to penalise devices that are slow writers, but in
>> practice it doesn't do that: it penalises the devices that have the
>> least activity.
>>
>>   
> I think we are seeing larger than usual number of COMMIT messages.
With older kernels, no matter how I configured linux to flush dirty 
pages, it would always flush too often and hence send way too many 
commit messages.  But more like every couple hundred megabytes..... 

If you don't need the page cache, I have found that O_DIRECT can 
increase performance by giving predefined points at which the client 
will commit.
Dean
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size of nfsv4 writes
  2008-06-04 22:04   ` Olga Kornievskaia
  2008-06-06  0:25     ` Dean Hildebrand
@ 2008-06-12 21:41     ` Olga Kornievskaia
  2008-06-13 16:33       ` Chuck Lever
  1 sibling, 1 reply; 8+ messages in thread
From: Olga Kornievskaia @ 2008-06-12 21:41 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, Chuck Lever, linux-nfs



Olga Kornievskaia wrote:
>
>
> Trond Myklebust wrote:
>> On Wed, 2008-06-04 at 12:40 -0400, Olga Kornievskaia wrote:
>>  
>>> While testing NFSv4 performance over the 10GE network, we are seeing 
>>> the following behavior and would like to know if it is normal or a 
>>> bug in the client code.
>>>
>>> The server offers the max_write of 1M. The client mounts the server 
>>> with the "wsize" option of 1M. Yet during the write we are seeing 
>>> that the write size is at most 49K. Why does client never come close 
>>> to 1M limit?
>>>     
>>
>> I have a feeling that is due to some crap in the VM. I'm currently
>> investigating a situation where it appears we're sending 1 COMMIT for
>> every 1-5 32k WRITEs. This is not a policy that stems from the NFS
>> client, so it would appear that the VM is being silly about things.
>>
>> I'm specially suspicious of the code in get_dirty_limits() that is
>> setting a limit to the number of dirty pages based on the number of
>> pages a given BDI has written out in the recent past. As far as I can
>> see, the intention is to penalise devices that are slow writers, but in
>> practice it doesn't do that: it penalises the devices that have the
>> least activity.
>>
>>   
> I think we are seeing larger than usual number of COMMIT messages.
Using Chuck's nfs-iostats to monitor an NFS write I can see that each 
operation writes about 830MB. Why is so much small than wsize=1M?


> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size of nfsv4 writes
  2008-06-12 21:41     ` Olga Kornievskaia
@ 2008-06-13 16:33       ` Chuck Lever
  2008-06-13 18:19         ` Olga Kornievskaia
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2008-06-13 16:33 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, linux-nfs

On Jun 12, 2008, at 5:41 PM, Olga Kornievskaia wrote:
> Olga Kornievskaia wrote:
>> Trond Myklebust wrote:
>>> On Wed, 2008-06-04 at 12:40 -0400, Olga Kornievskaia wrote:
>>>> While testing NFSv4 performance over the 10GE network, we are  
>>>> seeing the following behavior and would like to know if it is  
>>>> normal or a bug in the client code.
>>>>
>>>> The server offers the max_write of 1M. The client mounts the  
>>>> server with the "wsize" option of 1M. Yet during the write we are  
>>>> seeing that the write size is at most 49K. Why does client never  
>>>> come close to 1M limit?
>>>
>>> I have a feeling that is due to some crap in the VM. I'm currently
>>> investigating a situation where it appears we're sending 1 COMMIT  
>>> for
>>> every 1-5 32k WRITEs. This is not a policy that stems from the NFS
>>> client, so it would appear that the VM is being silly about things.
>>>
>>> I'm specially suspicious of the code in get_dirty_limits() that is
>>> setting a limit to the number of dirty pages based on the number of
>>> pages a given BDI has written out in the recent past. As far as I  
>>> can
>>> see, the intention is to penalise devices that are slow writers,  
>>> but in
>>> practice it doesn't do that: it penalises the devices that have the
>>> least activity.
>>>
>> I think we are seeing larger than usual number of COMMIT messages.
> Using Chuck's nfs-iostats to monitor an NFS write I can see that  
> each operation writes about 830MB. Why is so much small than wsize=1M?


I assume you mean 830KB.

Remember that nfs-iostats reports an average transfer size, so you may  
be seeing a lot of 1MB writes on the wire, and just enough small  
writes to reduce the average.  Or, the client may not be writing 1MB  
at all.

You have to look at a network trace to see which.

On the other hand, 830KB is still very large.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size of nfsv4 writes
  2008-06-13 16:33       ` Chuck Lever
@ 2008-06-13 18:19         ` Olga Kornievskaia
  2008-06-13 19:03           ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Olga Kornievskaia @ 2008-06-13 18:19 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, linux-nfs



Chuck Lever wrote:
> On Jun 12, 2008, at 5:41 PM, Olga Kornievskaia wrote:
>> Olga Kornievskaia wrote:
>>> Trond Myklebust wrote:
>>>> On Wed, 2008-06-04 at 12:40 -0400, Olga Kornievskaia wrote:
>>>>> While testing NFSv4 performance over the 10GE network, we are 
>>>>> seeing the following behavior and would like to know if it is 
>>>>> normal or a bug in the client code.
>>>>>
>>>>> The server offers the max_write of 1M. The client mounts the 
>>>>> server with the "wsize" option of 1M. Yet during the write we are 
>>>>> seeing that the write size is at most 49K. Why does client never 
>>>>> come close to 1M limit?
>>>>
>>>> I have a feeling that is due to some crap in the VM. I'm currently
>>>> investigating a situation where it appears we're sending 1 COMMIT for
>>>> every 1-5 32k WRITEs. This is not a policy that stems from the NFS
>>>> client, so it would appear that the VM is being silly about things.
>>>>
>>>> I'm specially suspicious of the code in get_dirty_limits() that is
>>>> setting a limit to the number of dirty pages based on the number of
>>>> pages a given BDI has written out in the recent past. As far as I can
>>>> see, the intention is to penalise devices that are slow writers, 
>>>> but in
>>>> practice it doesn't do that: it penalises the devices that have the
>>>> least activity.
>>>>
>>> I think we are seeing larger than usual number of COMMIT messages.
>> Using Chuck's nfs-iostats to monitor an NFS write I can see that each 
>> operation writes about 830MB. Why is so much small than wsize=1M?
>
>
> I assume you mean 830KB.
>
> Remember that nfs-iostats reports an average transfer size, so you may 
> be seeing a lot of 1MB writes on the wire, and just enough small 
> writes to reduce the average.  Or, the client may not be writing 1MB 
> at all.
>
> You have to look at a network trace to see which.
>
> On the other hand, 830KB is still very large.
Apologizes, yes, it is 830KB. If you say it's an average write then my 
question is why is NFS breaking down 1M writes into smaller chunks? When 
I say 1M write I'm referring the the user land (dd) calling write() with 
1M buffer.

I'm trying to understand why NFS has poor write performance. I had 2 
leads to pursue (1) nfs-iostats shows that each write operation is 
 >100KB smaller than a read operation and (2) I see that during a write 
nfs-iostats reports fewer operations per second than during a read. The 
latter can be due to the COMMIT problem.

If nfs client is writing less amount of data on each operation and it is 
not able to write fast enough, wouldn't that explain it's poor performance.

Current read performance is 590MB/s
Current write performance is 230MB/s

-O
>
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size of nfsv4 writes
  2008-06-13 18:19         ` Olga Kornievskaia
@ 2008-06-13 19:03           ` Chuck Lever
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2008-06-13 19:03 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, linux-nfs

On Jun 13, 2008, at 2:19 PM, Olga Kornievskaia wrote:
> Chuck Lever wrote:
>> On Jun 12, 2008, at 5:41 PM, Olga Kornievskaia wrote:
>>> Olga Kornievskaia wrote:
>>>> Trond Myklebust wrote:
>>>>> On Wed, 2008-06-04 at 12:40 -0400, Olga Kornievskaia wrote:
>>>>>> While testing NFSv4 performance over the 10GE network, we are  
>>>>>> seeing the following behavior and would like to know if it is  
>>>>>> normal or a bug in the client code.
>>>>>>
>>>>>> The server offers the max_write of 1M. The client mounts the  
>>>>>> server with the "wsize" option of 1M. Yet during the write we  
>>>>>> are seeing that the write size is at most 49K. Why does client  
>>>>>> never come close to 1M limit?
>>>>>
>>>>> I have a feeling that is due to some crap in the VM. I'm currently
>>>>> investigating a situation where it appears we're sending 1  
>>>>> COMMIT for
>>>>> every 1-5 32k WRITEs. This is not a policy that stems from the NFS
>>>>> client, so it would appear that the VM is being silly about  
>>>>> things.
>>>>>
>>>>> I'm specially suspicious of the code in get_dirty_limits() that is
>>>>> setting a limit to the number of dirty pages based on the number  
>>>>> of
>>>>> pages a given BDI has written out in the recent past. As far as  
>>>>> I can
>>>>> see, the intention is to penalise devices that are slow writers,  
>>>>> but in
>>>>> practice it doesn't do that: it penalises the devices that have  
>>>>> the
>>>>> least activity.
>>>>>
>>>> I think we are seeing larger than usual number of COMMIT messages.
>>> Using Chuck's nfs-iostats to monitor an NFS write I can see that  
>>> each operation writes about 830MB. Why is so much small than  
>>> wsize=1M?
>>
>>
>> I assume you mean 830KB.
>>
>> Remember that nfs-iostats reports an average transfer size, so you  
>> may be seeing a lot of 1MB writes on the wire, and just enough  
>> small writes to reduce the average.  Or, the client may not be  
>> writing 1MB at all.
>>
>> You have to look at a network trace to see which.
>>
>> On the other hand, 830KB is still very large.
> Apologizes, yes, it is 830KB. If you say it's an average write then  
> my question is why is NFS breaking down 1M writes into smaller  
> chunks? When I say 1M write I'm referring the the user land (dd)  
> calling write() with 1M buffer.

To understand what is really happening (how often is the client not  
sending a full 1MB?  Are the metrics perhaps lying?) you have to  
capture a network trace and look at what's going on.

We've already been through the problems of looking at such a trace  
with wireshark, but perhaps the text-based equivalent (tethereal?  
tireshark?) will be better about analyzing the packets correctly.   
Then you can use awk or Python to extract a histogram of write sizes.   
You then have immediate graphical evidence of misbehavior.

The client might break up large writes if the VFS breaks them up for  
some reason, or if there is memory pressure that triggers a flush in  
the middle of doing a large 1MB write, or maybe there's a bug...  
looking at the network behavior will give you some clue about where to  
look next.

> I'm trying to understand why NFS has poor write performance. I had 2  
> leads to pursue (1) nfs-iostats shows that each write operation is  
> >100KB smaller than a read operation and (2) I see that during a  
> write nfs-iostats reports fewer operations per second than during a  
> read. The latter can be due to the COMMIT problem.

Yes, and I don't think we know yet whether these are synchronous  
COMMITs (the client waits for the result) or asynchronous COMMITs (the  
client sends the COMMIT request, but keeps writing).

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-13 19:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04 16:40 size of nfsv4 writes Olga Kornievskaia
2008-06-04 16:46 ` Trond Myklebust
2008-06-04 22:04   ` Olga Kornievskaia
2008-06-06  0:25     ` Dean Hildebrand
2008-06-12 21:41     ` Olga Kornievskaia
2008-06-13 16:33       ` Chuck Lever
2008-06-13 18:19         ` Olga Kornievskaia
2008-06-13 19:03           ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.