DoS with NFSv4.1 client

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* DoS with NFSv4.1 client
       [not found] <1201078747.580554.1381350008792.JavaMail.zimbra@desy.de>
@ 2013-10-09 20:48 ` Mkrtchyan, Tigran
  2013-10-10  9:56   ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 8+ messages in thread
From: Mkrtchyan, Tigran @ 2013-10-09 20:48 UTC (permalink / raw)
  To: linux-nfs@vger.kernel.org; +Cc: Andy Adamson, Steve Dickson

Hi,

last night we got a DoS attack with one of the NFS clients.
The farm node, which was accessing data with pNFS,
went mad and have tried to kill dCache NFS server. As usually
this have happened over night and we was not able to
get a network traffic or bump the debug level.

The symptoms are:

client starts to bombard the MDS with OPEN requests. As we see
state created on the server side, the requests was processed by
server. Nevertheless, for some reason, client did not like it. Here
is the result of mountstats:

OPEN:
	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
	avg bytes sent per op: 356	avg bytes received per op: 455
	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094 (milliseconds)
CLOSE:
	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
	avg bytes sent per op: 247	avg bytes received per op: 173
	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time: 2057.365517 (milliseconds)

As you can see there is a quite a big difference between number of open and close requests.
The same picture we can see on the server side as well:

NFSServerV41 Stats:                   average±stderr(ns)       min(ns)     max(ns)            Sampes 
  DESTROY_SESSION                          26056±4511.89        13000        97000                17
  OPEN                                    1197297±  0.00       816000  31924558000          54398533
  RESTOREFH                                     0±  0.00            0  25018778000          54398533
  SEQUENCE                                   1000±  0.00         1000  26066722000          55601046
  LOOKUP                                  4607959±  0.00       375000  26977455000             32118
  GETDEVICEINFO                             13158±100.88         4000       655000             11378
  CLOSE                                  16236211±  0.00         5000  21021819000             20420
  LAYOUTGET                             271736361±  0.00     10003000  68414723000             21095

The last column is the number of requests.

This is with RHEL6.4 as the client. By looking at the code,
I can see a loop at nfs4proc.c#nfs4_do_open() which can be
the cause of the problem. Nevertheless, I can't
fine any reason why this look turned into an 'infinite' one.

At the and our server ran out of memory and we have returned
NFSERR_SERVERFAULT to the client. This triggered client to
reestablish the session and all open state ids was
invalidated and cleaned up.

I am still trying to reproduce this behavior (on client
and server) and any hint is welcome.

Tigran.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DoS with NFSv4.1 client
  2013-10-09 20:48 ` DoS with NFSv4.1 client Mkrtchyan, Tigran
@ 2013-10-10  9:56   ` Mkrtchyan, Tigran
  2013-10-10 14:14     ` Weston Andros Adamson
       [not found]     ` <24F8F3E5-578D-43C1-83B8-F3310526D4AE@netapp.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Mkrtchyan, Tigran @ 2013-10-10  9:56 UTC (permalink / raw)
  To: linux-nfs; +Cc: Andy Adamson, Steve Dickson



Today we was 'luck' to have such situation at day time.
Here is what happens:

The client sends an OPEN and gets an open state id.
This is followed by LAYOUTGET ... and READ to DS.
At some point, server returns back BAD_STATEID.
This triggers client to issue a new OPEN and use
new open stateid with READ request to DS. As new 
stateid is not known to DS, it keeps returning
BAD_STATEID and becomes an infinite loop.

Regards,
   Tigran.

 

----- Original Message -----
> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> To: linux-nfs@vger.kernel.org
> Cc: "Andy Adamson" <william.adamson@netapp.com>, "Steve Dickson" <steved@redhat.com>
> Sent: Wednesday, October 9, 2013 10:48:32 PM
> Subject: DoS with NFSv4.1 client
> 
> 
> Hi,
> 
> last night we got a DoS attack with one of the NFS clients.
> The farm node, which was accessing data with pNFS,
> went mad and have tried to kill dCache NFS server. As usually
> this have happened over night and we was not able to
> get a network traffic or bump the debug level.
> 
> The symptoms are:
> 
> client starts to bombard the MDS with OPEN requests. As we see
> state created on the server side, the requests was processed by
> server. Nevertheless, for some reason, client did not like it. Here
> is the result of mountstats:
> 
> OPEN:
> 	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
> 	avg bytes sent per op: 356	avg bytes received per op: 455
> 	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094
> 	(milliseconds)
> CLOSE:
> 	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
> 	avg bytes sent per op: 247	avg bytes received per op: 173
> 	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time: 2057.365517
> 	(milliseconds)
> 
> 
> As you can see there is a quite a big difference between number of open and
> close requests.
> The same picture we can see on the server side as well:
> 
> NFSServerV41 Stats:                   average±stderr(ns)       min(ns)
> max(ns)            Sampes
>   DESTROY_SESSION                          26056±4511.89        13000
>   97000                17
>   OPEN                                    1197297±  0.00       816000
>   31924558000          54398533
>   RESTOREFH                                     0±  0.00            0
>   25018778000          54398533
>   SEQUENCE                                   1000±  0.00         1000
>   26066722000          55601046
>   LOOKUP                                  4607959±  0.00       375000
>   26977455000             32118
>   GETDEVICEINFO                             13158±100.88         4000
>   655000             11378
>   CLOSE                                  16236211±  0.00         5000
>   21021819000             20420
>   LAYOUTGET                             271736361±  0.00     10003000
>   68414723000             21095
> 
> The last column is the number of requests.
> 
> This is with RHEL6.4 as the client. By looking at the code,
> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
> the cause of the problem. Nevertheless, I can't
> fine any reason why this look turned into an 'infinite' one.
> 
> At the and our server ran out of memory and we have returned
> NFSERR_SERVERFAULT to the client. This triggered client to
> reestablish the session and all open state ids was
> invalidated and cleaned up.
> 
> I am still trying to reproduce this behavior (on client
> and server) and any hint is welcome.
> 
> Tigran.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DoS with NFSv4.1 client
  2013-10-10  9:56   ` Mkrtchyan, Tigran
@ 2013-10-10 14:14     ` Weston Andros Adamson
  2013-10-10 14:35       ` Weston Andros Adamson
       [not found]     ` <24F8F3E5-578D-43C1-83B8-F3310526D4AE@netapp.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Weston Andros Adamson @ 2013-10-10 14:14 UTC (permalink / raw)
  To: Mkrtchyan, Tigran
  Cc: <linux-nfs@vger.kernel.org>, Adamson, Andy, Steve Dickson

So is this a server bug? It seems like the client is behaving correctly...

-dros

On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> wrote:

> 
> 
> Today we was 'luck' to have such situation at day time.
> Here is what happens:
> 
> The client sends an OPEN and gets an open state id.
> This is followed by LAYOUTGET ... and READ to DS.
> At some point, server returns back BAD_STATEID.
> This triggers client to issue a new OPEN and use
> new open stateid with READ request to DS. As new 
> stateid is not known to DS, it keeps returning
> BAD_STATEID and becomes an infinite loop.
> 
> Regards,
>   Tigran.
> 
> 
> 
> ----- Original Message -----
>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>> To: linux-nfs@vger.kernel.org
>> Cc: "Andy Adamson" <william.adamson@netapp.com>, "Steve Dickson" <steved@redhat.com>
>> Sent: Wednesday, October 9, 2013 10:48:32 PM
>> Subject: DoS with NFSv4.1 client
>> 
>> 
>> Hi,
>> 
>> last night we got a DoS attack with one of the NFS clients.
>> The farm node, which was accessing data with pNFS,
>> went mad and have tried to kill dCache NFS server. As usually
>> this have happened over night and we was not able to
>> get a network traffic or bump the debug level.
>> 
>> The symptoms are:
>> 
>> client starts to bombard the MDS with OPEN requests. As we see
>> state created on the server side, the requests was processed by
>> server. Nevertheless, for some reason, client did not like it. Here
>> is the result of mountstats:
>> 
>> OPEN:
>> 	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
>> 	avg bytes sent per op: 356	avg bytes received per op: 455
>> 	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094
>> 	(milliseconds)
>> CLOSE:
>> 	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
>> 	avg bytes sent per op: 247	avg bytes received per op: 173
>> 	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time: 2057.365517
>> 	(milliseconds)
>> 
>> 
>> As you can see there is a quite a big difference between number of open and
>> close requests.
>> The same picture we can see on the server side as well:
>> 
>> NFSServerV41 Stats:                   average±stderr(ns)       min(ns)
>> max(ns)            Sampes
>>  DESTROY_SESSION                          26056±4511.89        13000
>>  97000                17
>>  OPEN                                    1197297±  0.00       816000
>>  31924558000          54398533
>>  RESTOREFH                                     0±  0.00            0
>>  25018778000          54398533
>>  SEQUENCE                                   1000±  0.00         1000
>>  26066722000          55601046
>>  LOOKUP                                  4607959±  0.00       375000
>>  26977455000             32118
>>  GETDEVICEINFO                             13158±100.88         4000
>>  655000             11378
>>  CLOSE                                  16236211±  0.00         5000
>>  21021819000             20420
>>  LAYOUTGET                             271736361±  0.00     10003000
>>  68414723000             21095
>> 
>> The last column is the number of requests.
>> 
>> This is with RHEL6.4 as the client. By looking at the code,
>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
>> the cause of the problem. Nevertheless, I can't
>> fine any reason why this look turned into an 'infinite' one.
>> 
>> At the and our server ran out of memory and we have returned
>> NFSERR_SERVERFAULT to the client. This triggered client to
>> reestablish the session and all open state ids was
>> invalidated and cleaned up.
>> 
>> I am still trying to reproduce this behavior (on client
>> and server) and any hint is welcome.
>> 
>> Tigran.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DoS with NFSv4.1 client
  2013-10-10 14:14     ` Weston Andros Adamson
@ 2013-10-10 14:35       ` Weston Andros Adamson
  2013-10-10 14:48         ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 8+ messages in thread
From: Weston Andros Adamson @ 2013-10-10 14:35 UTC (permalink / raw)
  To: Mkrtchyan, Tigran
  Cc: <linux-nfs@vger.kernel.org>, Adamson, Andy, Steve Dickson

Well, it'd be nice not to loop forever, but my question remains, is this due to a server bug (the DS not knowing about new stateid from MDS)?

-dros

On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <dros@netapp.com> wrote:

> So is this a server bug? It seems like the client is behaving correctly...
> 
> -dros
> 
> On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> wrote:
> 
>> 
>> 
>> Today we was 'luck' to have such situation at day time.
>> Here is what happens:
>> 
>> The client sends an OPEN and gets an open state id.
>> This is followed by LAYOUTGET ... and READ to DS.
>> At some point, server returns back BAD_STATEID.
>> This triggers client to issue a new OPEN and use
>> new open stateid with READ request to DS. As new 
>> stateid is not known to DS, it keeps returning
>> BAD_STATEID and becomes an infinite loop.
>> 
>> Regards,
>>  Tigran.
>> 
>> 
>> 
>> ----- Original Message -----
>>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>>> To: linux-nfs@vger.kernel.org
>>> Cc: "Andy Adamson" <william.adamson@netapp.com>, "Steve Dickson" <steved@redhat.com>
>>> Sent: Wednesday, October 9, 2013 10:48:32 PM
>>> Subject: DoS with NFSv4.1 client
>>> 
>>> 
>>> Hi,
>>> 
>>> last night we got a DoS attack with one of the NFS clients.
>>> The farm node, which was accessing data with pNFS,
>>> went mad and have tried to kill dCache NFS server. As usually
>>> this have happened over night and we was not able to
>>> get a network traffic or bump the debug level.
>>> 
>>> The symptoms are:
>>> 
>>> client starts to bombard the MDS with OPEN requests. As we see
>>> state created on the server side, the requests was processed by
>>> server. Nevertheless, for some reason, client did not like it. Here
>>> is the result of mountstats:
>>> 
>>> OPEN:
>>> 	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
>>> 	avg bytes sent per op: 356	avg bytes received per op: 455
>>> 	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094
>>> 	(milliseconds)
>>> CLOSE:
>>> 	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
>>> 	avg bytes sent per op: 247	avg bytes received per op: 173
>>> 	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time: 2057.365517
>>> 	(milliseconds)
>>> 
>>> 
>>> As you can see there is a quite a big difference between number of open and
>>> close requests.
>>> The same picture we can see on the server side as well:
>>> 
>>> NFSServerV41 Stats:                   average±stderr(ns)       min(ns)
>>> max(ns)            Sampes
>>> DESTROY_SESSION                          26056±4511.89        13000
>>> 97000                17
>>> OPEN                                    1197297±  0.00       816000
>>> 31924558000          54398533
>>> RESTOREFH                                     0±  0.00            0
>>> 25018778000          54398533
>>> SEQUENCE                                   1000±  0.00         1000
>>> 26066722000          55601046
>>> LOOKUP                                  4607959±  0.00       375000
>>> 26977455000             32118
>>> GETDEVICEINFO                             13158±100.88         4000
>>> 655000             11378
>>> CLOSE                                  16236211±  0.00         5000
>>> 21021819000             20420
>>> LAYOUTGET                             271736361±  0.00     10003000
>>> 68414723000             21095
>>> 
>>> The last column is the number of requests.
>>> 
>>> This is with RHEL6.4 as the client. By looking at the code,
>>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
>>> the cause of the problem. Nevertheless, I can't
>>> fine any reason why this look turned into an 'infinite' one.
>>> 
>>> At the and our server ran out of memory and we have returned
>>> NFSERR_SERVERFAULT to the client. This triggered client to
>>> reestablish the session and all open state ids was
>>> invalidated and cleaned up.
>>> 
>>> I am still trying to reproduce this behavior (on client
>>> and server) and any hint is welcome.
>>> 
>>> Tigran.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DoS with NFSv4.1 client
       [not found]         ` <194A4163-11C3-453B-9E3C-6B1460A7E57A@netapp.com>
@ 2013-10-10 14:42           ` Adamson, Andy
  0 siblings, 0 replies; 8+ messages in thread
From: Adamson, Andy @ 2013-10-10 14:42 UTC (permalink / raw)
  To: Adamson, Andy
  Cc: Tigran Mkrtchyan, linux-nfs@vger.kernel.org list, Steve Dickson,
	Weston Andros Adamson

Sorry - I answered this email thread from my netapp account and didn't 'cc the lists.

-->Andy

On Oct 10, 2013, at 10:19 AM, "Adamson, Andy" <William.Adamson@netapp.com>
 wrote:

> 
> On Oct 10, 2013, at 10:03 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> wrote:
> 
>> Not only. As was able to reproduce it and fix on the server,
>> we see that at the end client will send only one CLOSE.
> 
> I don't understand. If it is fixed on the server, then the client will send an OPEN, get an openstateid - say OS-1 , do a LAYOUTGET, and READ to the DS using OS-1. The server then returns BAD stateid on the READ. 
> 
> The client then goes through stateid recovery, which means issuing another OPEN to get OS-2, which is then used for the DS READS
> 
> the client then CLOSE the file using OS-2. 
> 
> Are you saying that the client does not close using OS-1? Note that is impossible, as OS-1 is a BAD stateid….
> 
> -->Andy
> 
>> 
>> Tigran.
>> 
>> ----- Original Message -----
>>> From: "Andy Adamson" <William.Adamson@netapp.com>
>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>>> Sent: Thursday, October 10, 2013 3:55:55 PM
>>> Subject: Re: DoS with NFSv4.1 client
>>> 
>>> OK - so it's a server bug,
>>> 
>>> -->Andy
>>> 
>>> On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
>>> wrote:
>>> 
>>>> 
>>>> 
>>>> Today we was 'luck' to have such situation at day time.
>>>> Here is what happens:
>>>> 
>>>> The client sends an OPEN and gets an open state id.
>>>> This is followed by LAYOUTGET ... and READ to DS.
>>>> At some point, server returns back BAD_STATEID.
>>>> This triggers client to issue a new OPEN and use
>>>> new open stateid with READ request to DS. As new
>>>> stateid is not known to DS, it keeps returning
>>>> BAD_STATEID and becomes an infinite loop.
>>>> 
>>>> Regards,
>>>> Tigran.
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>>>>> To: linux-nfs@vger.kernel.org
>>>>> Cc: "Andy Adamson" <william.adamson@netapp.com>, "Steve Dickson"
>>>>> <steved@redhat.com>
>>>>> Sent: Wednesday, October 9, 2013 10:48:32 PM
>>>>> Subject: DoS with NFSv4.1 client
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> last night we got a DoS attack with one of the NFS clients.
>>>>> The farm node, which was accessing data with pNFS,
>>>>> went mad and have tried to kill dCache NFS server. As usually
>>>>> this have happened over night and we was not able to
>>>>> get a network traffic or bump the debug level.
>>>>> 
>>>>> The symptoms are:
>>>>> 
>>>>> client starts to bombard the MDS with OPEN requests. As we see
>>>>> state created on the server side, the requests was processed by
>>>>> server. Nevertheless, for some reason, client did not like it. Here
>>>>> is the result of mountstats:
>>>>> 
>>>>> OPEN:
>>>>> 	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
>>>>> 	avg bytes sent per op: 356	avg bytes received per op: 455
>>>>> 	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094
>>>>> 	(milliseconds)
>>>>> CLOSE:
>>>>> 	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
>>>>> 	avg bytes sent per op: 247	avg bytes received per op: 173
>>>>> 	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time:
>>>>> 	2057.365517
>>>>> 	(milliseconds)
>>>>> 
>>>>> 
>>>>> As you can see there is a quite a big difference between number of open
>>>>> and
>>>>> close requests.
>>>>> The same picture we can see on the server side as well:
>>>>> 
>>>>> NFSServerV41 Stats:                   average±stderr(ns)       min(ns)
>>>>> max(ns)            Sampes
>>>>> DESTROY_SESSION                          26056±4511.89        13000
>>>>> 97000                17
>>>>> OPEN                                    1197297±  0.00       816000
>>>>> 31924558000          54398533
>>>>> RESTOREFH                                     0±  0.00            0
>>>>> 25018778000          54398533
>>>>> SEQUENCE                                   1000±  0.00         1000
>>>>> 26066722000          55601046
>>>>> LOOKUP                                  4607959±  0.00       375000
>>>>> 26977455000             32118
>>>>> GETDEVICEINFO                             13158±100.88         4000
>>>>> 655000             11378
>>>>> CLOSE                                  16236211±  0.00         5000
>>>>> 21021819000             20420
>>>>> LAYOUTGET                             271736361±  0.00     10003000
>>>>> 68414723000             21095
>>>>> 
>>>>> The last column is the number of requests.
>>>>> 
>>>>> This is with RHEL6.4 as the client. By looking at the code,
>>>>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
>>>>> the cause of the problem. Nevertheless, I can't
>>>>> fine any reason why this look turned into an 'infinite' one.
>>>>> 
>>>>> At the and our server ran out of memory and we have returned
>>>>> NFSERR_SERVERFAULT to the client. This triggered client to
>>>>> reestablish the session and all open state ids was
>>>>> invalidated and cleaned up.
>>>>> 
>>>>> I am still trying to reproduce this behavior (on client
>>>>> and server) and any hint is welcome.
>>>>> 
>>>>> Tigran.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> 
>>> 
>>> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DoS with NFSv4.1 client
  2013-10-10 14:35       ` Weston Andros Adamson
@ 2013-10-10 14:48         ` Mkrtchyan, Tigran
  2013-10-10 15:11           ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 8+ messages in thread
From: Mkrtchyan, Tigran @ 2013-10-10 14:48 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: linux-nfs, Andy Adamson, Steve Dickson



----- Original Message -----
> From: "Weston Andros Adamson" <dros@netapp.com>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "<linux-nfs@vger.kernel.org>" <linux-nfs@vger.kernel.org>, "Andy Adamson" <William.Adamson@netapp.com>, "Steve
> Dickson" <steved@redhat.com>
> Sent: Thursday, October 10, 2013 4:35:25 PM
> Subject: Re: DoS with NFSv4.1 client
> 
> Well, it'd be nice not to loop forever, but my question remains, is this due
> to a server bug (the DS not knowing about new stateid from MDS)?
> 

Up to now, we have pushed open state id to the DS only on LAYOUTGET.
This have to be changed, as the behaviour is not spec compliant.

Tigran.

> -dros
> 
> On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <dros@netapp.com> wrote:
> 
> > So is this a server bug? It seems like the client is behaving correctly...
> > 
> > -dros
> > 
> > On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> > wrote:
> > 
> >> 
> >> 
> >> Today we was 'luck' to have such situation at day time.
> >> Here is what happens:
> >> 
> >> The client sends an OPEN and gets an open state id.
> >> This is followed by LAYOUTGET ... and READ to DS.
> >> At some point, server returns back BAD_STATEID.
> >> This triggers client to issue a new OPEN and use
> >> new open stateid with READ request to DS. As new
> >> stateid is not known to DS, it keeps returning
> >> BAD_STATEID and becomes an infinite loop.
> >> 
> >> Regards,
> >>  Tigran.
> >> 
> >> 
> >> 
> >> ----- Original Message -----
> >>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> >>> To: linux-nfs@vger.kernel.org
> >>> Cc: "Andy Adamson" <william.adamson@netapp.com>, "Steve Dickson"
> >>> <steved@redhat.com>
> >>> Sent: Wednesday, October 9, 2013 10:48:32 PM
> >>> Subject: DoS with NFSv4.1 client
> >>> 
> >>> 
> >>> Hi,
> >>> 
> >>> last night we got a DoS attack with one of the NFS clients.
> >>> The farm node, which was accessing data with pNFS,
> >>> went mad and have tried to kill dCache NFS server. As usually
> >>> this have happened over night and we was not able to
> >>> get a network traffic or bump the debug level.
> >>> 
> >>> The symptoms are:
> >>> 
> >>> client starts to bombard the MDS with OPEN requests. As we see
> >>> state created on the server side, the requests was processed by
> >>> server. Nevertheless, for some reason, client did not like it. Here
> >>> is the result of mountstats:
> >>> 
> >>> OPEN:
> >>> 	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
> >>> 	avg bytes sent per op: 356	avg bytes received per op: 455
> >>> 	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094
> >>> 	(milliseconds)
> >>> CLOSE:
> >>> 	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
> >>> 	avg bytes sent per op: 247	avg bytes received per op: 173
> >>> 	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time:
> >>> 	2057.365517
> >>> 	(milliseconds)
> >>> 
> >>> 
> >>> As you can see there is a quite a big difference between number of open
> >>> and
> >>> close requests.
> >>> The same picture we can see on the server side as well:
> >>> 
> >>> NFSServerV41 Stats:                   average±stderr(ns)       min(ns)
> >>> max(ns)            Sampes
> >>> DESTROY_SESSION                          26056±4511.89        13000
> >>> 97000                17
> >>> OPEN                                    1197297±  0.00       816000
> >>> 31924558000          54398533
> >>> RESTOREFH                                     0±  0.00            0
> >>> 25018778000          54398533
> >>> SEQUENCE                                   1000±  0.00         1000
> >>> 26066722000          55601046
> >>> LOOKUP                                  4607959±  0.00       375000
> >>> 26977455000             32118
> >>> GETDEVICEINFO                             13158±100.88         4000
> >>> 655000             11378
> >>> CLOSE                                  16236211±  0.00         5000
> >>> 21021819000             20420
> >>> LAYOUTGET                             271736361±  0.00     10003000
> >>> 68414723000             21095
> >>> 
> >>> The last column is the number of requests.
> >>> 
> >>> This is with RHEL6.4 as the client. By looking at the code,
> >>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
> >>> the cause of the problem. Nevertheless, I can't
> >>> fine any reason why this look turned into an 'infinite' one.
> >>> 
> >>> At the and our server ran out of memory and we have returned
> >>> NFSERR_SERVERFAULT to the client. This triggered client to
> >>> reestablish the session and all open state ids was
> >>> invalidated and cleaned up.
> >>> 
> >>> I am still trying to reproduce this behavior (on client
> >>> and server) and any hint is welcome.
> >>> 
> >>> Tigran.
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DoS with NFSv4.1 client
  2013-10-10 14:48         ` Mkrtchyan, Tigran
@ 2013-10-10 15:11           ` Mkrtchyan, Tigran
  2013-10-10 15:39             ` Adamson, Andy
  0 siblings, 1 reply; 8+ messages in thread
From: Mkrtchyan, Tigran @ 2013-10-10 15:11 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: linux-nfs, Andy Adamson, Steve Dickson



This is probably a question to IEFT working group, but anyway.
If my layout has a flag 'return-on-close' and open state id
is not valid any more should client expect layout to be still valid?

Tigran.

----- Original Message -----
> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> To: "Weston Andros Adamson" <dros@netapp.com>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Andy Adamson" <William.Adamson@netapp.com>, "Steve Dickson"
> <steved@redhat.com>
> Sent: Thursday, October 10, 2013 4:48:52 PM
> Subject: Re: DoS with NFSv4.1 client
> 
> 
> 
> ----- Original Message -----
> > From: "Weston Andros Adamson" <dros@netapp.com>
> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > Cc: "<linux-nfs@vger.kernel.org>" <linux-nfs@vger.kernel.org>, "Andy
> > Adamson" <William.Adamson@netapp.com>, "Steve
> > Dickson" <steved@redhat.com>
> > Sent: Thursday, October 10, 2013 4:35:25 PM
> > Subject: Re: DoS with NFSv4.1 client
> > 
> > Well, it'd be nice not to loop forever, but my question remains, is this
> > due
> > to a server bug (the DS not knowing about new stateid from MDS)?
> > 
> 
> Up to now, we have pushed open state id to the DS only on LAYOUTGET.
> This have to be changed, as the behaviour is not spec compliant.
> 
> Tigran.
> 
> > -dros
> > 
> > On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <dros@netapp.com>
> > wrote:
> > 
> > > So is this a server bug? It seems like the client is behaving
> > > correctly...
> > > 
> > > -dros
> > > 
> > > On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran"
> > > <tigran.mkrtchyan@desy.de>
> > > wrote:
> > > 
> > >> 
> > >> 
> > >> Today we was 'luck' to have such situation at day time.
> > >> Here is what happens:
> > >> 
> > >> The client sends an OPEN and gets an open state id.
> > >> This is followed by LAYOUTGET ... and READ to DS.
> > >> At some point, server returns back BAD_STATEID.
> > >> This triggers client to issue a new OPEN and use
> > >> new open stateid with READ request to DS. As new
> > >> stateid is not known to DS, it keeps returning
> > >> BAD_STATEID and becomes an infinite loop.
> > >> 
> > >> Regards,
> > >>  Tigran.
> > >> 
> > >> 
> > >> 
> > >> ----- Original Message -----
> > >>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > >>> To: linux-nfs@vger.kernel.org
> > >>> Cc: "Andy Adamson" <william.adamson@netapp.com>, "Steve Dickson"
> > >>> <steved@redhat.com>
> > >>> Sent: Wednesday, October 9, 2013 10:48:32 PM
> > >>> Subject: DoS with NFSv4.1 client
> > >>> 
> > >>> 
> > >>> Hi,
> > >>> 
> > >>> last night we got a DoS attack with one of the NFS clients.
> > >>> The farm node, which was accessing data with pNFS,
> > >>> went mad and have tried to kill dCache NFS server. As usually
> > >>> this have happened over night and we was not able to
> > >>> get a network traffic or bump the debug level.
> > >>> 
> > >>> The symptoms are:
> > >>> 
> > >>> client starts to bombard the MDS with OPEN requests. As we see
> > >>> state created on the server side, the requests was processed by
> > >>> server. Nevertheless, for some reason, client did not like it. Here
> > >>> is the result of mountstats:
> > >>> 
> > >>> OPEN:
> > >>> 	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
> > >>> 	avg bytes sent per op: 356	avg bytes received per op: 455
> > >>> 	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094
> > >>> 	(milliseconds)
> > >>> CLOSE:
> > >>> 	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
> > >>> 	avg bytes sent per op: 247	avg bytes received per op: 173
> > >>> 	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time:
> > >>> 	2057.365517
> > >>> 	(milliseconds)
> > >>> 
> > >>> 
> > >>> As you can see there is a quite a big difference between number of open
> > >>> and
> > >>> close requests.
> > >>> The same picture we can see on the server side as well:
> > >>> 
> > >>> NFSServerV41 Stats:                   average±stderr(ns)       min(ns)
> > >>> max(ns)            Sampes
> > >>> DESTROY_SESSION                          26056±4511.89        13000
> > >>> 97000                17
> > >>> OPEN                                    1197297±  0.00       816000
> > >>> 31924558000          54398533
> > >>> RESTOREFH                                     0±  0.00            0
> > >>> 25018778000          54398533
> > >>> SEQUENCE                                   1000±  0.00         1000
> > >>> 26066722000          55601046
> > >>> LOOKUP                                  4607959±  0.00       375000
> > >>> 26977455000             32118
> > >>> GETDEVICEINFO                             13158±100.88         4000
> > >>> 655000             11378
> > >>> CLOSE                                  16236211±  0.00         5000
> > >>> 21021819000             20420
> > >>> LAYOUTGET                             271736361±  0.00     10003000
> > >>> 68414723000             21095
> > >>> 
> > >>> The last column is the number of requests.
> > >>> 
> > >>> This is with RHEL6.4 as the client. By looking at the code,
> > >>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
> > >>> the cause of the problem. Nevertheless, I can't
> > >>> fine any reason why this look turned into an 'infinite' one.
> > >>> 
> > >>> At the and our server ran out of memory and we have returned
> > >>> NFSERR_SERVERFAULT to the client. This triggered client to
> > >>> reestablish the session and all open state ids was
> > >>> invalidated and cleaned up.
> > >>> 
> > >>> I am still trying to reproduce this behavior (on client
> > >>> and server) and any hint is welcome.
> > >>> 
> > >>> Tigran.
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > >>> the body of a message to majordomo@vger.kernel.org
> > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>> 
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DoS with NFSv4.1 client
  2013-10-10 15:11           ` Mkrtchyan, Tigran
@ 2013-10-10 15:39             ` Adamson, Andy
  0 siblings, 0 replies; 8+ messages in thread
From: Adamson, Andy @ 2013-10-10 15:39 UTC (permalink / raw)
  To: Mkrtchyan, Tigran
  Cc: Weston Andros Adamson, linux-nfs, Adamson, Andy, Steve Dickson


On Oct 10, 2013, at 11:11 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
 wrote:

> 
> 
> This is probably a question to IEFT working group, but anyway.
> If my layout has a flag 'return-on-close' and open state id
> is not valid any more should client expect layout to be still valid?

Here is my take:

The layout stateid is constructed from the first open stateid when pNFS I/O is tried on that file. Once the layout return is successful, the layout stateid is independent from the open stateid used to construct it.
So if that open, or another open stateid goes bad, the layout stateid is still valid.

WRT return-on-close, the invalid openstateid means there is no CLOSE until after the OPEN stateid is recovered (CLAIM_PREVIOUS) and the CLOSE call has a valid stateid. No CLOSE on an invalid stateid means no return-on-close for the invalid stateid which means the layout is still valid until the CLOSE using the recovered open stateid.

-->Andy


> 
> Tigran.
> 
> ----- Original Message -----
>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>> To: "Weston Andros Adamson" <dros@netapp.com>
>> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Andy Adamson" <William.Adamson@netapp.com>, "Steve Dickson"
>> <steved@redhat.com>
>> Sent: Thursday, October 10, 2013 4:48:52 PM
>> Subject: Re: DoS with NFSv4.1 client
>> 
>> 
>> 
>> ----- Original Message -----
>>> From: "Weston Andros Adamson" <dros@netapp.com>
>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>>> Cc: "<linux-nfs@vger.kernel.org>" <linux-nfs@vger.kernel.org>, "Andy
>>> Adamson" <William.Adamson@netapp.com>, "Steve
>>> Dickson" <steved@redhat.com>
>>> Sent: Thursday, October 10, 2013 4:35:25 PM
>>> Subject: Re: DoS with NFSv4.1 client
>>> 
>>> Well, it'd be nice not to loop forever, but my question remains, is this
>>> due
>>> to a server bug (the DS not knowing about new stateid from MDS)?
>>> 
>> 
>> Up to now, we have pushed open state id to the DS only on LAYOUTGET.
>> This have to be changed, as the behavior is not spec compliant.
>> 
>> Tigran.
>> 
>>> -dros
>>> 
>>> On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <dros@netapp.com>
>>> wrote:
>>> 
>>>> So is this a server bug? It seems like the client is behaving
>>>> correctly...
>>>> 
>>>> -dros
>>>> 
>>>> On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran"
>>>> <tigran.mkrtchyan@desy.de>
>>>> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> Today we was 'luck' to have such situation at day time.
>>>>> Here is what happens:
>>>>> 
>>>>> The client sends an OPEN and gets an open state id.
>>>>> This is followed by LAYOUTGET ... and READ to DS.
>>>>> At some point, server returns back BAD_STATEID.
>>>>> This triggers client to issue a new OPEN and use
>>>>> new open stateid with READ request to DS. As new
>>>>> stateid is not known to DS, it keeps returning
>>>>> BAD_STATEID and becomes an infinite loop.
>>>>> 
>>>>> Regards,
>>>>> Tigran.
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>>>>>> To: linux-nfs@vger.kernel.org
>>>>>> Cc: "Andy Adamson" <william.adamson@netapp.com>, "Steve Dickson"
>>>>>> <steved@redhat.com>
>>>>>> Sent: Wednesday, October 9, 2013 10:48:32 PM
>>>>>> Subject: DoS with NFSv4.1 client
>>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> last night we got a DoS attack with one of the NFS clients.
>>>>>> The farm node, which was accessing data with pNFS,
>>>>>> went mad and have tried to kill dCache NFS server. As usually
>>>>>> this have happened over night and we was not able to
>>>>>> get a network traffic or bump the debug level.
>>>>>> 
>>>>>> The symptoms are:
>>>>>> 
>>>>>> client starts to bombard the MDS with OPEN requests. As we see
>>>>>> state created on the server side, the requests was processed by
>>>>>> server. Nevertheless, for some reason, client did not like it. Here
>>>>>> is the result of mountstats:
>>>>>> 
>>>>>> OPEN:
>>>>>> 	17087065 ops (99%) 	1 retrans (0%) 	0 major timeouts
>>>>>> 	avg bytes sent per op: 356	avg bytes received per op: 455
>>>>>> 	backlog wait: 0.014707 	RTT: 4.535704 	total execute time: 4.574094
>>>>>> 	(milliseconds)
>>>>>> CLOSE:
>>>>>> 	290 ops (0%) 	0 retrans (0%) 	0 major timeouts
>>>>>> 	avg bytes sent per op: 247	avg bytes received per op: 173
>>>>>> 	backlog wait: 308.827586 	RTT: 1748.479310 	total execute time:
>>>>>> 	2057.365517
>>>>>> 	(milliseconds)
>>>>>> 
>>>>>> 
>>>>>> As you can see there is a quite a big difference between number of open
>>>>>> and
>>>>>> close requests.
>>>>>> The same picture we can see on the server side as well:
>>>>>> 
>>>>>> NFSServerV41 Stats:                   average±stderr(ns)       min(ns)
>>>>>> max(ns)            Sampes
>>>>>> DESTROY_SESSION                          26056±4511.89        13000
>>>>>> 97000                17
>>>>>> OPEN                                    1197297±  0.00       816000
>>>>>> 31924558000          54398533
>>>>>> RESTOREFH                                     0±  0.00            0
>>>>>> 25018778000          54398533
>>>>>> SEQUENCE                                   1000±  0.00         1000
>>>>>> 26066722000          55601046
>>>>>> LOOKUP                                  4607959±  0.00       375000
>>>>>> 26977455000             32118
>>>>>> GETDEVICEINFO                             13158±100.88         4000
>>>>>> 655000             11378
>>>>>> CLOSE                                  16236211±  0.00         5000
>>>>>> 21021819000             20420
>>>>>> LAYOUTGET                             271736361±  0.00     10003000
>>>>>> 68414723000             21095
>>>>>> 
>>>>>> The last column is the number of requests.
>>>>>> 
>>>>>> This is with RHEL6.4 as the client. By looking at the code,
>>>>>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
>>>>>> the cause of the problem. Nevertheless, I can't
>>>>>> fine any reason why this look turned into an 'infinite' one.
>>>>>> 
>>>>>> At the and our server ran out of memory and we have returned
>>>>>> NFSERR_SERVERFAULT to the client. This triggered client to
>>>>>> reestablish the session and all open state ids was
>>>>>> invalidated and cleaned up.
>>>>>> 
>>>>>> I am still trying to reproduce this behavior (on client
>>>>>> and server) and any hint is welcome.
>>>>>> 
>>>>>> Tigran.
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-10-10 15:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1201078747.580554.1381350008792.JavaMail.zimbra@desy.de>
2013-10-09 20:48 ` DoS with NFSv4.1 client Mkrtchyan, Tigran
2013-10-10  9:56   ` Mkrtchyan, Tigran
2013-10-10 14:14     ` Weston Andros Adamson
2013-10-10 14:35       ` Weston Andros Adamson
2013-10-10 14:48         ` Mkrtchyan, Tigran
2013-10-10 15:11           ` Mkrtchyan, Tigran
2013-10-10 15:39             ` Adamson, Andy
     [not found]     ` <24F8F3E5-578D-43C1-83B8-F3310526D4AE@netapp.com>
     [not found]       ` <22165921.590399.1381413800631.JavaMail.zimbra@desy.de>
     [not found]         ` <194A4163-11C3-453B-9E3C-6B1460A7E57A@netapp.com>
2013-10-10 14:42           ` Adamson, Andy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).