From: "Iozone" <capps@iozone.org>
To: "'Trond Myklebust'" <trond.myklebust@primarydata.com>
Cc: "'Linux NFS Mailing List'" <linux-nfs@vger.kernel.org>
Subject: RE: FW: Forwarding request at suggestion from support
Date: Wed, 04 Jun 2014 16:36:05 -0500 [thread overview]
Message-ID: <006301cf803c$fe1f2200$fa5d6600$@iozone.org> (raw)
In-Reply-To: <CAHQdGtR6CL-M4gi8qMEoX+E65gA8X2AHA3ya_B__T3KNcCBVSw@mail.gmail.com>
Trond,
I have traces where there are indeed a bunch of async reads issued, and
the replies come back. One with data, and all of the rest with zero bytes
transferred, indicating EOF. This was followed by a bunch more async
reads, all of which come back with zero bytes transferred. It appears
that if the user requested 16MB, and the file was 4k, then there will
be 16MB of transfers issued regardless of the fact that all but one
are returning zero bytes....
Business case:
This not only could this impact benchmarks... but it also has the potential
of opening a door for a DOS type attack on an NFS server. All it would take
is one small file, and a bunch of clients going after 1GB reads on that file
with O_DIRECT, and the poor NFS server is going to get slammed with
requests at a phenomenal rate (as the client is issuing these back-to-back
async, and the server is responding with back-to-back zero length
transfer replies). The client burns very little CPU, and the NFS server
is buried, doing zero length transfers... pretty much in a very tight loop....
Thank you,
Don Capps
-----Original Message-----
From: Trond Myklebust [mailto:trond.myklebust@primarydata.com]
Sent: Wednesday, June 04, 2014 4:15 PM
To: capps@iozone.org
Cc: Linux NFS Mailing List
Subject: Re: FW: Forwarding request at suggestion from support
On Wed, Jun 4, 2014 at 5:03 PM, Iozone <capps@iozone.org> wrote:
> Trond,
>
> Ok... but as the replies are coming back, all but one with EOF and zero bytes
> transferred, does it still make sense to keep issuing reads that are beyond EOF ?
It depends. The reads should all be sent asynchronously, so it isn't clear to me that the client will see the EOF until all the RPC requests are in flight.
That said, it is true that we do not have any machinery right now to stop further submissions if we see that we have already collected enough information to complete the read() syscall. Are there any good use cases for O_DIRECT that justify adding such machinery? Oracle doesn't seem to need it.
Cheers
Trond
> Enjoy,
> Don Capps
>
> -----Original Message-----
> From: Trond Myklebust [mailto:trond.myklebust@primarydata.com]
> Sent: Wednesday, June 04, 2014 3:42 PM
> To: capps@iozone.org
> Cc: Linux NFS Mailing List
> Subject: Re: FW: Forwarding request at suggestion from support
>
> Hi Don,
>
> On Wed, Jun 4, 2014 at 2:02 PM, Iozone <capps@iozone.org> wrote:
>>
>>
>> From: Iozone [mailto:capps@iozone.org]
>> Sent: Wednesday, June 04, 2014 11:39 AM
>> To: linux-nfs@vger.kernel.org
>> Subject: Forwarding request at suggestion from support
>>
>> Dear kernel folks,
>>
>> Please take a look at Bugzilla bug:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1104696
>>
>> Description of problem:
>>
>> Linux NFSv3 clients can issue extra reads beyond EOF.
>>
>> Condition of the test: (32KB_file is a file that is 32KB in size)
>> File is being read over an NFSv3 mount.
>>
>> dd if=/mnt/32KB_file of=/dev/null iflag=direct bs=1M
>> count=1
>>
>> What one should expect over the wire:
>> NFSv3_read for 32k, or NFS_read for 1M
>> NFSv3_read Reply return of 32KB and EOF set.
>>
>> What happens with Linux NFSv3 client:
>> NFSv3 read for 128k
>> NFSv3 read for 128k,
>> NFSv3 read for 128k,
>> NFSv3 read for 128k,
>> NFSv3 read for 128k,
>> NFSv3 read for 128k,
>> NFSv3 read for 128k,
>> NFSv3 read for 128k.
>> followed by:
>> NFSv3 read reply of 32k,
>> NFSv3 read reply of 0,
>> NFSv3 read reply of 0,
>> NFSv3 read reply of 0,
>> NFSv3 read reply of 0,
>> NFSv3 read reply of 0,
>> NFSv3 read reply of 0,
>> NFSv3 read reply of 0.
>>
>> So… instead of a single round trip with a short read length returned,
>> there were 8 async I/O ops sent to the NFS server, and 8 replies from
>> the NFS server.
>> The client knew the file size before even sending the very first
>> request, but went ahead and issued an large number of reads that it
>> should have known were beyond EOF.
>>
>> This client behavior hammers NFS servers with requests that are
>> guaranteed to always fail, and burn CPU cycles, for operations that
>> it knew were pointless.
>>
>> While the application is getting correct answers to the API calls,
>> the poor client and server are beating each other senseless over the wire.
>>
>> NOTE: This only happens if O_DIRECT is being used… (thus the
>> iflag=direct)
>
> Yes. This behaviour is intentional in the case of O_DIRECT. The reason why we should not change it is that we don't ever want to rely on cached values for the file size when doing uncached I/O.
> An application such as Oracle may have out-of-band information about writes to the file that were made by another client directly to the server, in which case it would be wrong for the kernel to truncate those reads based on its cached information.
>
> Cheers
> Trond
>
> --
> Trond Myklebust
>
> Linux NFS client maintainer, PrimaryData
>
> trond.myklebust@primarydata.com
>
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
next prev parent reply other threads:[~2014-06-04 21:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <004501cf8013$7a3373c0$6e9a5b40$@iozone.org>
2014-06-04 18:02 ` FW: Forwarding request at suggestion from support Iozone
2014-06-04 20:42 ` Trond Myklebust
2014-06-04 21:03 ` Iozone
2014-06-04 21:14 ` Trond Myklebust
2014-06-04 21:36 ` Iozone [this message]
2014-06-04 21:56 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='006301cf803c$fe1f2200$fa5d6600$@iozone.org' \
--to=capps@iozone.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).