Linux CIFS client module: login rate limiting

linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Linux CIFS client module: login rate limiting
@ 2017-01-19  7:48 Valentin Hilbig
       [not found] ` <58806F39.9010801-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Valentin Hilbig @ 2017-01-19  7:48 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Hello Linux Kernel CIFS-List,

please forgive me to ninja-register to the list and start my firstpost 
right with the questions.  This is done in the hope to save your time. 
The long background story is below in case you are interested:

Q1) Is it possible on the CIFS client to implement caching for failed 
CIFS/SMB authentication replies?  My wish is to cache those negative 
replies just a second (HZ), as 3600 retries per hour to re-establish a 
lost connection to a CIFS server seems enough.  Enough to succeed and 
enough on semi-permanent failures.  I'd like to see this 1000ms cache as 
a mount default, as it's not for the initial request, just for the 
subsequent retries, but setting it to 0 (no cache) is ok for me, too, as 
it then can be changed at mount-time.

Q2) As an extension I also would like to see something like a maximum 
retry counter, which declares a CIFS mount dead if we do not succeed 
after N negative replies.  In my case N=40000 (around at least 11 hrs 
for 1s cache time) sounds good.  However the rate-limiting is much more 
important than deactivating a rogue CIFS mount.  Hence mount's default 
should be N=0, which means, infinite retries (as it is today).

Q3) According to 
https://www.kernel.org/doc/readme/Documentation-filesystems-cifs-README 
these features do not exist (yet).  Are such features planned for the 
kernel CIFS client module?  If not, is there a chance for me to get 
patches upstream in case that I provide them?  Is there more to think of 
than to just follow the style guide (and provide kernel-grade code)?  Of 
course I will extend the sysctl/proc interface to those new mount 
options in a compatible way (or discuss this with the list before I 
break heritage).  However my patches will be for "our" kernels used here 
(3.13 and 4.4), so perhaps this needs some porting/upgrading for the 
latest (I am not sure that I get permission to take the time to provide 
patches to the current kernel as well).

Sorry if some of those are FAQ, but as gmane.org is down/blank 
currently, I do not have access to the archive of kernel.cifs.

If you some better ideas, please feel free to criticize me ;)

Thanks,
-Tino
PS: FYI full long (sorry!) details follow in case you are interested:

(Sorry for missing logs and plain prose, I have no access to the test 
installation ATM, because it belongs to another group.)

Here at LiMux (Linux for Munich) in certain situations (for example the 
user has changed the password in LDAP) we observe, that CIFS clients 
might send 30 or more failing CIFS-setup-requests per second(!) to the 
CIFS server for an existing (old) CIFS-mount.  Each of this requests 
tries to (re-)authenticate against AD/LDAP but fails, because the 
credentials are no more valid.  After a short while the brute force 
protection of the AD kicks in and then blocks the AD-client (in this 
case the CIFS server) from accessing AD (for a while).  Which means, 
other clients are affected by the faulty CIFS-mounts and prohibited to 
authenticate against the CIFS server.

The CIFS-Server-people cannot help, as the CIFS' vendor (no, not 
Microsoft) tells us to switch off brute-force-protection on AD-side, 
which is something we do not want to do for obvious reasons.  The AD 
shall continue to block IPs with too many wrong requests.  So the only 
option we have is, to do something against the high rate of AD-requests 
with a wrong password coming from CIFS clients.

To observe the effect following must happen:

- There is an old CIFS mount (for example a User's $HOME), which is 
already successfully mounted and working.

- The TCP session to the CIFS server breaks (like inactivity or some 
short outage on the network.  I used "tcpkill" to simulate that), such 
that the Kernel's CIFS module needs to re-establish a connection to the 
CIFS server for the next access, which then triggers re-authenticating 
with the stored credentials.

- This re-authentication fails, due to a password change or locked 
account on the AD side.  (If it succeeds there will be no problem, as 
then the CIFS mount is back to fully functional.  The problem starts, 
when this re-authentication does not work.)

- And there also must be some culprit, in my case some user process (we 
haven't identified it yet but think it's something like Thunderbird), 
which tries to access the CIFS share in some looping fashion.  (I used 
"while sleep 0.1; do touch /path/to/share/FILE; done" to test it.)

Please note that there are too many possible user space applications out 
there which could rapidly hammer a defunct CIFS mount, such that you 
won't be able to fix them all.  Hence we need a fix on some other level.

(BTW we use version=1 of the protocol, and we require it, upgrading 18k 
of Linux workstations plus infrastructure against politics ain't easy.)

The CIFS module just forwards the request(s) to the CIFS server, and, as 
the TCP-connection is broken, tries to establish a new one.  This 
triggers authentication, but the authentication fails.  So the 
CIFS-client sees a negative reply like NT ACCOUNT LOCKED OUT, and 
answers something like "permission denied" to the userspace.  So far, so 
correct, everything works perfectly as it should!

The problem starts when some userspace application starts to loop over 
the fault, thereby accessing the CIFS share over and over again, several 
times a second.  Then the CIFS module continues to do it's job, but it 
does it much too perfect.  Each single userspace access will try to 
re-open the session to the CIFS server, again and again, which means we 
see a massive amount of authentication requests to the server which all 
are doomed.  Even worse, the faster the server and the better the 
network, the more such failing requests you will see, of course.  This 
triggers the AD brute force protection even faster.

However, if those few CIFS-clients, which "freak out", would be limited 
to only send 1 request per second, then AD does not see too many failed 
requests per timespan, so everything stays operable.

But even if this is implemented, this is only half of the story (the 
important half, but there is more to it):

If we had rate-limiting in place the AD and CIFS server are out of the 
loop.  But we still have the user account locked by the failing AD 
requests.  Let's start over the case from the beginning under the 
assumption, that we have failed authentication reply caching with a 1s 
retry:

- The user changes his password (perhaps using Windows, not Linux) but 
does not log out afterwards (on Linux).

- The TCP-session of the CIFS mount breaks for some reason.

- Some userspace process tries to access this CIFS mount in the looping 
fashion.

- The Kernel's CIFS-module tries to re-establish the connection.

- The requests fails due to old credential. (As above.  Windows has the 
new password, but Linux not.)

- After 5 such false retries (seen from the CIFS-Server) the AD locks 
the account.  Now the Linux-Client sees NT ACCOUNT LOCKED (sp?).  This 
takes 5 seconds.

- If the user comes back to work the next day and tries to login, his 
account is locked, of course.

- He calls Help Desk to get his account unlocked.  They do it.

- But 5s later his account is locked, again.  Thanks to 5 retries seen 
from the old login on the Linux client.

- Wash, rinse, repeat.

Eventually the user finds out where he is still logged in and logs out, 
such that (in our case) the (automated, yet no more working) user's 
CIFS-mounts vanish, too.  This delays how long it takes until the user 
can work normally, also it usually involves a lot of effort of other 
people to solve the riddle where the login hides.

This is why I asked Q2 which would allow us to configure, that after 11 
hours (or so) the CIFS mount ceases to exist, such that the CIFS client 
stops trying to re-establish the connection.  Which means, the next 
business day, the CIFS mount very likely has invalidated (it still is 
mounted, but quiet on the Linux side), such that the user can have his 
password unlocked without trouble.

This is a tripple-win situation, as it not only helps the Users and 
takes the burden from Help Desk to diagnose a hard do diagnose 
situation, it also conserves some wasted network bandwidth and 
processing power due to all those fruitless authentication requests seen 
today.  Sigh.

I agree that all this is not the fault of the CIFS module.  However it 
is better to start to be nice and polite to the infrastructure in case 
something stupid happens, than to continue as usual and thereby wasting 
resources and possibly impact others, even when you are rightfully doing 
this.

(This is a technical list, so I do not introduce myself, because I am 
not important.  All you need to know is that I know Linux from 0.99 and 
I am able to hack the kernel, but until now only for my very own needs. 
  BTW, my private GitHub is https://github.com/hilbix/)

Thanks for any help or comments,

-Tino

-- 
Mit freundlichen Grüßen
Valentin Hilbig
Externer Dienstleister

IT@M - Dienstleister für Informations- und Telekommunikationstechnik der 
Landeshauptstadt München
Geschäftsbereich Werkzeuge und Infrastruktur
Servicebereich Städtische Arbeitsplätze
Serviceteam LiMux-Arbeitsplatz I23
LiMux-Basisclient

Raum A2.030, Agnes-Pockels-Bogen 21, 80992 München

Tel.: +49 89 233-782273
E-Mail: externer.dl.hilbig-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <58806F39.9010801-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org>]

* Re: Linux CIFS client module: login rate limiting
       [not found] ` <58806F39.9010801-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org>
@ 2017-01-20 21:30   ` Steve French
       [not found]     ` <CAH2r5mtrOqucTBXE3Ni02gWGVBG+o-EbgdVarL1xZjWv0S2xyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Steve French @ 2017-01-20 21:30 UTC (permalink / raw)
  To: Valentin Hilbig, Sachin Prabhu
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

A couple quick questions:
1) I would not expect "hard" vs "soft" mount option makes no
difference here, but just doublechecking
2) How does smb2 reconnect behave in the same scenario (because we
prefer smb3 to be used if the server is non-Samba)?

Looks like a fix is doable - see line 1464-1465 of fs/cifs/sess.c

    while (sess_data->func)
        sess_data->func(sess_data);

looking at cifs_reconnect in the case where the ip address is not
available we wait 3 seconds (if needed to retry), and when that
succeeds we schedule delayed work to issue an "echo" (see
cifs_reconnect) and then as we do cifs_reconnect_tcon we could wait up
to 10 seconds at a time for the socket to come back. If socket is ok
we do a negotiate protocol which is not necessarily retried on failure
(depending on the request it can return EAGAIN - e.g.
read/write/lock/close).  If the negprot succeeds we get to your case
where we call cifs_setup_session in fs/cifs/connect.c which calls
CIFS_SessSetup (in fs/cifs/sess.c) which looks like it will loop on
the sessionsetup retry for the cifs case - which should as you note
rate limit (especially on bad password case).

I also would like Sachin's feedback as he made some significant
cleanup of session establishment for cifs and rewrote this - wanted to
see if he wanted to move the throttling of retries differently

On Thu, Jan 19, 2017 at 1:48 AM, Valentin Hilbig
<externer.dl.hilbig-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org> wrote:
> Hello Linux Kernel CIFS-List,
>
> please forgive me to ninja-register to the list and start my firstpost right
> with the questions.  This is done in the hope to save your time. The long
> background story is below in case you are interested:
>
> Q1) Is it possible on the CIFS client to implement caching for failed
> CIFS/SMB authentication replies?  My wish is to cache those negative replies
> just a second (HZ), as 3600 retries per hour to re-establish a lost
> connection to a CIFS server seems enough.  Enough to succeed and enough on
> semi-permanent failures.  I'd like to see this 1000ms cache as a mount
> default, as it's not for the initial request, just for the subsequent
> retries, but setting it to 0 (no cache) is ok for me, too, as it then can be
> changed at mount-time.
>
> Q2) As an extension I also would like to see something like a maximum retry
> counter, which declares a CIFS mount dead if we do not succeed after N
> negative replies.  In my case N=40000 (around at least 11 hrs for 1s cache
> time) sounds good.  However the rate-limiting is much more important than
> deactivating a rogue CIFS mount.  Hence mount's default should be N=0, which
> means, infinite retries (as it is today).
>
> Q3) According to
> https://www.kernel.org/doc/readme/Documentation-filesystems-cifs-README
> these features do not exist (yet).  Are such features planned for the kernel
> CIFS client module?  If not, is there a chance for me to get patches
> upstream in case that I provide them?  Is there more to think of than to
> just follow the style guide (and provide kernel-grade code)?  Of course I
> will extend the sysctl/proc interface to those new mount options in a
> compatible way (or discuss this with the list before I break heritage).
> However my patches will be for "our" kernels used here (3.13 and 4.4), so
> perhaps this needs some porting/upgrading for the latest (I am not sure that
> I get permission to take the time to provide patches to the current kernel
> as well).
>
> Sorry if some of those are FAQ, but as gmane.org is down/blank currently, I
> do not have access to the archive of kernel.cifs.
>
> If you some better ideas, please feel free to criticize me ;)
>
> Thanks,
> -Tino
> PS: FYI full long (sorry!) details follow in case you are interested:
>
> (Sorry for missing logs and plain prose, I have no access to the test
> installation ATM, because it belongs to another group.)
>
> Here at LiMux (Linux for Munich) in certain situations (for example the user
> has changed the password in LDAP) we observe, that CIFS clients might send
> 30 or more failing CIFS-setup-requests per second(!) to the CIFS server for
> an existing (old) CIFS-mount.  Each of this requests tries to
> (re-)authenticate against AD/LDAP but fails, because the credentials are no
> more valid.  After a short while the brute force protection of the AD kicks
> in and then blocks the AD-client (in this case the CIFS server) from
> accessing AD (for a while).  Which means, other clients are affected by the
> faulty CIFS-mounts and prohibited to authenticate against the CIFS server.
>
> The CIFS-Server-people cannot help, as the CIFS' vendor (no, not Microsoft)
> tells us to switch off brute-force-protection on AD-side, which is something
> we do not want to do for obvious reasons.  The AD shall continue to block
> IPs with too many wrong requests.  So the only option we have is, to do
> something against the high rate of AD-requests with a wrong password coming
> from CIFS clients.
>
> To observe the effect following must happen:
>
> - There is an old CIFS mount (for example a User's $HOME), which is already
> successfully mounted and working.
>
> - The TCP session to the CIFS server breaks (like inactivity or some short
> outage on the network.  I used "tcpkill" to simulate that), such that the
> Kernel's CIFS module needs to re-establish a connection to the CIFS server
> for the next access, which then triggers re-authenticating with the stored
> credentials.
>
> - This re-authentication fails, due to a password change or locked account
> on the AD side.  (If it succeeds there will be no problem, as then the CIFS
> mount is back to fully functional.  The problem starts, when this
> re-authentication does not work.)
>
> - And there also must be some culprit, in my case some user process (we
> haven't identified it yet but think it's something like Thunderbird), which
> tries to access the CIFS share in some looping fashion.  (I used "while
> sleep 0.1; do touch /path/to/share/FILE; done" to test it.)
>
> Please note that there are too many possible user space applications out
> there which could rapidly hammer a defunct CIFS mount, such that you won't
> be able to fix them all.  Hence we need a fix on some other level.
>
> (BTW we use version=1 of the protocol, and we require it, upgrading 18k of
> Linux workstations plus infrastructure against politics ain't easy.)
>
> The CIFS module just forwards the request(s) to the CIFS server, and, as the
> TCP-connection is broken, tries to establish a new one.  This triggers
> authentication, but the authentication fails.  So the CIFS-client sees a
> negative reply like NT ACCOUNT LOCKED OUT, and answers something like
> "permission denied" to the userspace.  So far, so correct, everything works
> perfectly as it should!
>
> The problem starts when some userspace application starts to loop over the
> fault, thereby accessing the CIFS share over and over again, several times a
> second.  Then the CIFS module continues to do it's job, but it does it much
> too perfect.  Each single userspace access will try to re-open the session
> to the CIFS server, again and again, which means we see a massive amount of
> authentication requests to the server which all are doomed.  Even worse, the
> faster the server and the better the network, the more such failing requests
> you will see, of course.  This triggers the AD brute force protection even
> faster.
>
> However, if those few CIFS-clients, which "freak out", would be limited to
> only send 1 request per second, then AD does not see too many failed
> requests per timespan, so everything stays operable.
>
> But even if this is implemented, this is only half of the story (the
> important half, but there is more to it):
>
> If we had rate-limiting in place the AD and CIFS server are out of the loop.
> But we still have the user account locked by the failing AD requests.  Let's
> start over the case from the beginning under the assumption, that we have
> failed authentication reply caching with a 1s retry:
>
> - The user changes his password (perhaps using Windows, not Linux) but does
> not log out afterwards (on Linux).
>
> - The TCP-session of the CIFS mount breaks for some reason.
>
> - Some userspace process tries to access this CIFS mount in the looping
> fashion.
>
> - The Kernel's CIFS-module tries to re-establish the connection.
>
> - The requests fails due to old credential. (As above.  Windows has the new
> password, but Linux not.)
>
> - After 5 such false retries (seen from the CIFS-Server) the AD locks the
> account.  Now the Linux-Client sees NT ACCOUNT LOCKED (sp?).  This takes 5
> seconds.
>
> - If the user comes back to work the next day and tries to login, his
> account is locked, of course.
>
> - He calls Help Desk to get his account unlocked.  They do it.
>
> - But 5s later his account is locked, again.  Thanks to 5 retries seen from
> the old login on the Linux client.
>
> - Wash, rinse, repeat.
>
> Eventually the user finds out where he is still logged in and logs out, such
> that (in our case) the (automated, yet no more working) user's CIFS-mounts
> vanish, too.  This delays how long it takes until the user can work
> normally, also it usually involves a lot of effort of other people to solve
> the riddle where the login hides.
>
> This is why I asked Q2 which would allow us to configure, that after 11
> hours (or so) the CIFS mount ceases to exist, such that the CIFS client
> stops trying to re-establish the connection.  Which means, the next business
> day, the CIFS mount very likely has invalidated (it still is mounted, but
> quiet on the Linux side), such that the user can have his password unlocked
> without trouble.
>
> This is a tripple-win situation, as it not only helps the Users and takes
> the burden from Help Desk to diagnose a hard do diagnose situation, it also
> conserves some wasted network bandwidth and processing power due to all
> those fruitless authentication requests seen today.  Sigh.
>
> I agree that all this is not the fault of the CIFS module.  However it is
> better to start to be nice and polite to the infrastructure in case
> something stupid happens, than to continue as usual and thereby wasting
> resources and possibly impact others, even when you are rightfully doing
> this.
>
> (This is a technical list, so I do not introduce myself, because I am not
> important.  All you need to know is that I know Linux from 0.99 and I am
> able to hack the kernel, but until now only for my very own needs.  BTW, my
> private GitHub is https://github.com/hilbix/)
>
> Thanks for any help or comments,
>
> -Tino
>
> --
> Mit freundlichen Grüßen
> Valentin Hilbig
> Externer Dienstleister
>
> IT@M - Dienstleister für Informations- und Telekommunikationstechnik der
> Landeshauptstadt München
> Geschäftsbereich Werkzeuge und Infrastruktur
> Servicebereich Städtische Arbeitsplätze
> Serviceteam LiMux-Arbeitsplatz I23
> LiMux-Basisclient
>
> Raum A2.030, Agnes-Pockels-Bogen 21, 80992 München
>
> Tel.: +49 89 233-782273
> E-Mail: externer.dl.hilbig-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <CAH2r5mtrOqucTBXE3Ni02gWGVBG+o-EbgdVarL1xZjWv0S2xyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Linux CIFS client module: login rate limiting
       [not found]     ` <CAH2r5mtrOqucTBXE3Ni02gWGVBG+o-EbgdVarL1xZjWv0S2xyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-01-23 10:57       ` Sachin Prabhu
       [not found]         ` <1485169046.17488.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-01-23 12:13       ` Valentin Hilbig
  1 sibling, 1 reply; 6+ messages in thread
From: Sachin Prabhu @ 2017-01-23 10:57 UTC (permalink / raw)
  To: Steve French, Valentin Hilbig
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2017-01-20 at 15:30 -0600, Steve French wrote:
> A couple quick questions:
> 1) I would not expect "hard" vs "soft" mount option makes no
> difference here, but just doublechecking
> 2) How does smb2 reconnect behave in the same scenario (because we
> prefer smb3 to be used if the server is non-Samba)?
> 
> Looks like a fix is doable - see line 1464-1465 of fs/cifs/sess.c
> 
>     while (sess_data->func)
>         sess_data->func(sess_data);
> 
> looking at cifs_reconnect in the case where the ip address is not
> available we wait 3 seconds (if needed to retry), and when that
> succeeds we schedule delayed work to issue an "echo" (see
> cifs_reconnect) and then as we do cifs_reconnect_tcon we could wait
> up
> to 10 seconds at a time for the socket to come back. If socket is ok
> we do a negotiate protocol which is not necessarily retried on
> failure
> (depending on the request it can return EAGAIN - e.g.
> read/write/lock/close).  If the negprot succeeds we get to your case
> where we call cifs_setup_session in fs/cifs/connect.c which calls
> CIFS_SessSetup (in fs/cifs/sess.c) which looks like it will loop on
> the sessionsetup retry for the cifs case - which should as you note
> rate limit (especially on bad password case).
> 
> I also would like Sachin's feedback as he made some significant
> cleanup of session establishment for cifs and rewrote this - wanted
> to
> see if he wanted to move the throttling of retries differently

I think the suggestion is perfectly valid and would be a nice addition
to the cifs module. Maybe a better place to add this change would be at

cifs_reconnect_tcon()
{
..
        mutex_lock(&ses->session_mutex);
        rc = cifs_negotiate_protocol(0, ses);
        if (rc == 0 && ses->need_reconnect)
                rc = cifs_setup_session(0, ses, nls_codepage);
..
}
Where in case of EACCES, we can setup a delayed work to unlock ses-
>session_mutex set to run after the required interval.

Sachin Prabhu

> 
> On Thu, Jan 19, 2017 at 1:48 AM, Valentin Hilbig
> <externer.dl.hilbig-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org> wrote:
> > Hello Linux Kernel CIFS-List,
> > 
> > please forgive me to ninja-register to the list and start my
> > firstpost right
> > with the questions.  This is done in the hope to save your time.
> > The long
> > background story is below in case you are interested:
> > 
> > Q1) Is it possible on the CIFS client to implement caching for
> > failed
> > CIFS/SMB authentication replies?  My wish is to cache those
> > negative replies
> > just a second (HZ), as 3600 retries per hour to re-establish a lost
> > connection to a CIFS server seems enough.  Enough to succeed and
> > enough on
> > semi-permanent failures.  I'd like to see this 1000ms cache as a
> > mount
> > default, as it's not for the initial request, just for the
> > subsequent
> > retries, but setting it to 0 (no cache) is ok for me, too, as it
> > then can be
> > changed at mount-time.
> > 
> > Q2) As an extension I also would like to see something like a
> > maximum retry
> > counter, which declares a CIFS mount dead if we do not succeed
> > after N
> > negative replies.  In my case N=40000 (around at least 11 hrs for
> > 1s cache
> > time) sounds good.  However the rate-limiting is much more
> > important than
> > deactivating a rogue CIFS mount.  Hence mount's default should be
> > N=0, which
> > means, infinite retries (as it is today).
> > 
> > Q3) According to
> > https://www.kernel.org/doc/readme/Documentation-filesystems-cifs-RE
> > ADME
> > these features do not exist (yet).  Are such features planned for
> > the kernel
> > CIFS client module?  If not, is there a chance for me to get
> > patches
> > upstream in case that I provide them?  Is there more to think of
> > than to
> > just follow the style guide (and provide kernel-grade code)?  Of
> > course I
> > will extend the sysctl/proc interface to those new mount options in
> > a
> > compatible way (or discuss this with the list before I break
> > heritage).
> > However my patches will be for "our" kernels used here (3.13 and
> > 4.4), so
> > perhaps this needs some porting/upgrading for the latest (I am not
> > sure that
> > I get permission to take the time to provide patches to the current
> > kernel
> > as well).
> > 
> > Sorry if some of those are FAQ, but as gmane.org is down/blank
> > currently, I
> > do not have access to the archive of kernel.cifs.
> > 
> > If you some better ideas, please feel free to criticize me ;)
> > 
> > Thanks,
> > -Tino
> > PS: FYI full long (sorry!) details follow in case you are
> > interested:
> > 
> > (Sorry for missing logs and plain prose, I have no access to the
> > test
> > installation ATM, because it belongs to another group.)
> > 
> > Here at LiMux (Linux for Munich) in certain situations (for example
> > the user
> > has changed the password in LDAP) we observe, that CIFS clients
> > might send
> > 30 or more failing CIFS-setup-requests per second(!) to the CIFS
> > server for
> > an existing (old) CIFS-mount.  Each of this requests tries to
> > (re-)authenticate against AD/LDAP but fails, because the
> > credentials are no
> > more valid.  After a short while the brute force protection of the
> > AD kicks
> > in and then blocks the AD-client (in this case the CIFS server)
> > from
> > accessing AD (for a while).  Which means, other clients are
> > affected by the
> > faulty CIFS-mounts and prohibited to authenticate against the CIFS
> > server.
> > 
> > The CIFS-Server-people cannot help, as the CIFS' vendor (no, not
> > Microsoft)
> > tells us to switch off brute-force-protection on AD-side, which is
> > something
> > we do not want to do for obvious reasons.  The AD shall continue to
> > block
> > IPs with too many wrong requests.  So the only option we have is,
> > to do
> > something against the high rate of AD-requests with a wrong
> > password coming
> > from CIFS clients.
> > 
> > To observe the effect following must happen:
> > 
> > - There is an old CIFS mount (for example a User's $HOME), which is
> > already
> > successfully mounted and working.
> > 
> > - The TCP session to the CIFS server breaks (like inactivity or
> > some short
> > outage on the network.  I used "tcpkill" to simulate that), such
> > that the
> > Kernel's CIFS module needs to re-establish a connection to the CIFS
> > server
> > for the next access, which then triggers re-authenticating with the
> > stored
> > credentials.
> > 
> > - This re-authentication fails, due to a password change or locked
> > account
> > on the AD side.  (If it succeeds there will be no problem, as then
> > the CIFS
> > mount is back to fully functional.  The problem starts, when this
> > re-authentication does not work.)
> > 
> > - And there also must be some culprit, in my case some user process
> > (we
> > haven't identified it yet but think it's something like
> > Thunderbird), which
> > tries to access the CIFS share in some looping fashion.  (I used
> > "while
> > sleep 0.1; do touch /path/to/share/FILE; done" to test it.)
> > 
> > Please note that there are too many possible user space
> > applications out
> > there which could rapidly hammer a defunct CIFS mount, such that
> > you won't
> > be able to fix them all.  Hence we need a fix on some other level.
> > 
> > (BTW we use version=1 of the protocol, and we require it, upgrading
> > 18k of
> > Linux workstations plus infrastructure against politics ain't
> > easy.)
> > 
> > The CIFS module just forwards the request(s) to the CIFS server,
> > and, as the
> > TCP-connection is broken, tries to establish a new one.  This
> > triggers
> > authentication, but the authentication fails.  So the CIFS-client
> > sees a
> > negative reply like NT ACCOUNT LOCKED OUT, and answers something
> > like
> > "permission denied" to the userspace.  So far, so correct,
> > everything works
> > perfectly as it should!
> > 
> > The problem starts when some userspace application starts to loop
> > over the
> > fault, thereby accessing the CIFS share over and over again,
> > several times a
> > second.  Then the CIFS module continues to do it's job, but it does
> > it much
> > too perfect.  Each single userspace access will try to re-open the
> > session
> > to the CIFS server, again and again, which means we see a massive
> > amount of
> > authentication requests to the server which all are doomed.  Even
> > worse, the
> > faster the server and the better the network, the more such failing
> > requests
> > you will see, of course.  This triggers the AD brute force
> > protection even
> > faster.
> > 
> > However, if those few CIFS-clients, which "freak out", would be
> > limited to
> > only send 1 request per second, then AD does not see too many
> > failed
> > requests per timespan, so everything stays operable.
> > 
> > But even if this is implemented, this is only half of the story
> > (the
> > important half, but there is more to it):
> > 
> > If we had rate-limiting in place the AD and CIFS server are out of
> > the loop.
> > But we still have the user account locked by the failing AD
> > requests.  Let's
> > start over the case from the beginning under the assumption, that
> > we have
> > failed authentication reply caching with a 1s retry:
> > 
> > - The user changes his password (perhaps using Windows, not Linux)
> > but does
> > not log out afterwards (on Linux).
> > 
> > - The TCP-session of the CIFS mount breaks for some reason.
> > 
> > - Some userspace process tries to access this CIFS mount in the
> > looping
> > fashion.
> > 
> > - The Kernel's CIFS-module tries to re-establish the connection.
> > 
> > - The requests fails due to old credential. (As above.  Windows has
> > the new
> > password, but Linux not.)
> > 
> > - After 5 such false retries (seen from the CIFS-Server) the AD
> > locks the
> > account.  Now the Linux-Client sees NT ACCOUNT LOCKED (sp?).  This
> > takes 5
> > seconds.
> > 
> > - If the user comes back to work the next day and tries to login,
> > his
> > account is locked, of course.
> > 
> > - He calls Help Desk to get his account unlocked.  They do it.
> > 
> > - But 5s later his account is locked, again.  Thanks to 5 retries
> > seen from
> > the old login on the Linux client.
> > 
> > - Wash, rinse, repeat.
> > 
> > Eventually the user finds out where he is still logged in and logs
> > out, such
> > that (in our case) the (automated, yet no more working) user's
> > CIFS-mounts
> > vanish, too.  This delays how long it takes until the user can work
> > normally, also it usually involves a lot of effort of other people
> > to solve
> > the riddle where the login hides.
> > 
> > This is why I asked Q2 which would allow us to configure, that
> > after 11
> > hours (or so) the CIFS mount ceases to exist, such that the CIFS
> > client
> > stops trying to re-establish the connection.  Which means, the next
> > business
> > day, the CIFS mount very likely has invalidated (it still is
> > mounted, but
> > quiet on the Linux side), such that the user can have his password
> > unlocked
> > without trouble.
> > 
> > This is a tripple-win situation, as it not only helps the Users and
> > takes
> > the burden from Help Desk to diagnose a hard do diagnose situation,
> > it also
> > conserves some wasted network bandwidth and processing power due to
> > all
> > those fruitless authentication requests seen today.  Sigh.
> > 
> > I agree that all this is not the fault of the CIFS module.  However
> > it is
> > better to start to be nice and polite to the infrastructure in case
> > something stupid happens, than to continue as usual and thereby
> > wasting
> > resources and possibly impact others, even when you are rightfully
> > doing
> > this.
> > 
> > (This is a technical list, so I do not introduce myself, because I
> > am not
> > important.  All you need to know is that I know Linux from 0.99 and
> > I am
> > able to hack the kernel, but until now only for my very own
> > needs.  BTW, my
> > private GitHub is https://github.com/hilbix/)
> > 
> > Thanks for any help or comments,
> > 
> > -Tino
> > 
> > --
> > Mit freundlichen Grüßen
> > Valentin Hilbig
> > Externer Dienstleister
> > 
> > IT@M - Dienstleister für Informations- und
> > Telekommunikationstechnik der
> > Landeshauptstadt München
> > Geschäftsbereich Werkzeuge und Infrastruktur
> > Servicebereich Städtische Arbeitsplätze
> > Serviceteam LiMux-Arbeitsplatz I23
> > LiMux-Basisclient
> > 
> > Raum A2.030, Agnes-Pockels-Bogen 21, 80992 München
> > 
> > Tel.: +49 89 233-782273
> > E-Mail: externer.dl.hilbig-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > cifs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <1485169046.17488.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: Linux CIFS client module: login rate limiting
       [not found]         ` <1485169046.17488.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-01-24  9:57           ` Sachin Prabhu
       [not found]             ` <1485251879.17488.14.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Sachin Prabhu @ 2017-01-24  9:57 UTC (permalink / raw)
  To: Steve French, Valentin Hilbig
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[-- Attachment #1: Type: text/plain, Size: 2528 bytes --]

On Mon, 2017-01-23 at 16:27 +0530, Sachin Prabhu wrote:
> On Fri, 2017-01-20 at 15:30 -0600, Steve French wrote:
> > A couple quick questions:
> > 1) I would not expect "hard" vs "soft" mount option makes no
> > difference here, but just doublechecking
> > 2) How does smb2 reconnect behave in the same scenario (because we
> > prefer smb3 to be used if the server is non-Samba)?
> > 
> > Looks like a fix is doable - see line 1464-1465 of fs/cifs/sess.c
> > 
> >     while (sess_data->func)
> >         sess_data->func(sess_data);
> > 
> > looking at cifs_reconnect in the case where the ip address is not
> > available we wait 3 seconds (if needed to retry), and when that
> > succeeds we schedule delayed work to issue an "echo" (see
> > cifs_reconnect) and then as we do cifs_reconnect_tcon we could wait
> > up
> > to 10 seconds at a time for the socket to come back. If socket is
> > ok
> > we do a negotiate protocol which is not necessarily retried on
> > failure
> > (depending on the request it can return EAGAIN - e.g.
> > read/write/lock/close).  If the negprot succeeds we get to your
> > case
> > where we call cifs_setup_session in fs/cifs/connect.c which calls
> > CIFS_SessSetup (in fs/cifs/sess.c) which looks like it will loop on
> > the sessionsetup retry for the cifs case - which should as you note
> > rate limit (especially on bad password case).
> > 
> > I also would like Sachin's feedback as he made some significant
> > cleanup of session establishment for cifs and rewrote this - wanted
> > to
> > see if he wanted to move the throttling of retries differently
> 
> I think the suggestion is perfectly valid and would be a nice
> addition
> to the cifs module. Maybe a better place to add this change would be
> at
> 
> cifs_reconnect_tcon()
> {
> ..
>         mutex_lock(&ses->session_mutex);
>         rc = cifs_negotiate_protocol(0, ses);
>         if (rc == 0 && ses->need_reconnect)
>                 rc = cifs_setup_session(0, ses, nls_codepage);
> ..
> }
> Where in case of EACCES, we can setup a delayed work to unlock ses-
> > session_mutex set to run after the required interval.
> 

Having given it another look, since it is unlikely to recover
automatically, I think it is better to cache the lookup and return the
cached lookup as long as the cache is still valid. I am also in favour
of a longer cache interval.

Attached is a patch which can work in this case. I use a cache interval
of 10 seconds which can be extended further.


[-- Attachment #2: 0001-cifs-Cache-Access-denied-errors-when-reconnecting.patch --]
[-- Type: text/x-patch, Size: 3655 bytes --]

From 7ca9125be5522679c777a8e9a27a0af22a3d273d Mon Sep 17 00:00:00 2001
From: Sachin Prabhu <sprabhu@redhat.com>
Date: Tue, 24 Jan 2017 12:43:03 +0530
Subject: [PATCH] cifs: Cache Access denied errors when reconnecting

If he account credentials on a mounted share is changed while still
mounted, remounts will fail with -EACCES. Since a new session setup call
is made every time an attempt is made to access this share, a large
number of failed session setup calls are made. This causes problems with
certain server setups which consider it as an attack on the account and
block further access from the account. To avoid this, cache all -EACCES
errors and avoid this problem.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
---
 fs/cifs/cifsglob.h |  4 ++++
 fs/cifs/cifssmb.c  | 20 ++++++++++++++++++--
 fs/cifs/connect.c  | 14 ++++++++++++++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 7ea8a33..3c7c0c6 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -75,6 +75,8 @@
 #define SMB_ECHO_INTERVAL_MAX 600
 #define SMB_ECHO_INTERVAL_DEFAULT 60
 
+#define SMB_NEGATIVE_CACHE_INTERVAL 10
+
 /*
  * Default number of credits to keep available for SMB3.
  * This value is chosen somewhat arbitrarily. The Windows client
@@ -832,6 +834,8 @@ struct cifs_ses {
 	bool sign;		/* is signing required? */
 	bool need_reconnect:1; /* connection reset, uid now invalid */
 	bool domainAuto:1;
+	bool cached_rc;
+	struct delayed_work clear_cached_rc;
 #ifdef CONFIG_CIFS_SMB2
 	__u16 session_flags;
 	__u8 smb3signingkey[SMB3_SIGN_KEY_SIZE];
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index b472618..2196d16 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -179,8 +179,24 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
 	 */
 	mutex_lock(&ses->session_mutex);
 	rc = cifs_negotiate_protocol(0, ses);
-	if (rc == 0 && ses->need_reconnect)
-		rc = cifs_setup_session(0, ses, nls_codepage);
+	if (rc) {
+		mutex_unlock(&ses->session_mutex);
+		goto out;
+	}
+
+	if (ses->need_reconnect) {
+		if (ses->cached_rc) {
+			rc = ses->cached_rc;
+		} else {
+			rc = cifs_setup_session(0, ses, nls_codepage);
+			if (rc == -EACCES) {
+				queue_delayed_work(cifsiod_wq,
+					&ses->clear_cached_rc,
+					SMB_NEGATIVE_CACHE_INTERVAL * HZ);
+				ses->cached_rc = rc;
+			}
+		}
+	}
 
 	/* do we need to reconnect tcon? */
 	if (rc || !tcon->need_reconnect) {
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 35ae49e..f82b280 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2375,6 +2375,7 @@ cifs_put_smb_ses(struct cifs_ses *ses)
 	list_del_init(&ses->smb_ses_list);
 	spin_unlock(&cifs_tcp_ses_lock);
 
+	cancel_delayed_work_sync(&ses->clear_cached_rc);
 	sesInfoFree(ses);
 	cifs_put_tcp_session(server, 0);
 }
@@ -2510,6 +2511,16 @@ cifs_set_cifscreds(struct smb_vol *vol __attribute__((unused)),
 }
 #endif /* CONFIG_KEYS */
 
+static void clear_cached_rc(struct work_struct *work)
+{
+	struct cifs_ses *ses = container_of(work, struct cifs_ses,
+						clear_cached_rc.work);
+
+	mutex_lock(&ses->session_mutex);
+	ses->cached_rc = 0;
+	mutex_unlock(&ses->session_mutex);
+}
+
 static struct cifs_ses *
 cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb_vol *volume_info)
 {
@@ -2592,6 +2603,9 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb_vol *volume_info)
 	ses->sectype = volume_info->sectype;
 	ses->sign = volume_info->sign;
 
+	ses->cached_rc = 0;
+	INIT_DELAYED_WORK(&ses->clear_cached_rc, clear_cached_rc);
+
 	mutex_lock(&ses->session_mutex);
 	rc = cifs_negotiate_protocol(xid, ses);
 	if (!rc)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

[parent not found: <1485251879.17488.14.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: Linux CIFS client module: login rate limiting
       [not found]             ` <1485251879.17488.14.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-28 16:34               ` Valentin Hilbig
  0 siblings, 0 replies; 6+ messages in thread
From: Valentin Hilbig @ 2017-04-28 16:34 UTC (permalink / raw)
  To: Sachin Prabhu, Steve French
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 2017-01-24 at 10:57, Sachin Prabhu wrote:
> On Mon, 2017-01-23 at 16:27 +0530, Sachin Prabhu wrote:
>> On Fri, 2017-01-20 at 15:30 -0600, Steve French wrote:

>> cifs_reconnect_tcon()
>> {
>> ..
>>          mutex_lock(&ses->session_mutex);
>>          rc = cifs_negotiate_protocol(0, ses);
>>          if (rc == 0 && ses->need_reconnect)
>>                  rc = cifs_setup_session(0, ses, nls_codepage);
>> ..
>> }
>> Where in case of EACCES, we can setup a delayed work to unlock ses-
>>> session_mutex set to run after the required interval.

> Attached is a patch which can work in this case. I use a cache interval
> of 10 seconds which can be extended further.

It took a while until I found time to test your suggested patch.
Sorry for the delay.  Please note that your patch needs a small patch:

In cifsglob.h (around line 835) the

+	bool cached_rc;

should read

+	int  cached_rc;

as it holds the rc, not a flag.

Then it works as expected!  As soon as the reconnect fails due to 
password trouble, CIFS pauses 10s until the next retry.  Wonderful. ;)

So thank you very much for pointing me into the right direction!

As soon as everything is finished here I will report back to this list 
to give you the link to GitHub where I pushed my patches against Ubuntu 
3.13 and 4.4 kernels.

CU
-Tino

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux CIFS client module: login rate limiting
       [not found]     ` <CAH2r5mtrOqucTBXE3Ni02gWGVBG+o-EbgdVarL1xZjWv0S2xyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-01-23 10:57       ` Sachin Prabhu
@ 2017-01-23 12:13       ` Valentin Hilbig
  1 sibling, 0 replies; 6+ messages in thread
From: Valentin Hilbig @ 2017-01-23 12:13 UTC (permalink / raw)
  To: Sachin Prabhu, Steve French
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Thank you for reminding me about hard vs. soft.  We use the default 
which apparently is "soft" (and not hard as I thought, else I would have 
checked it with hard already).  FWIW here are our full mount options:

rw,vers=1.0,sec=ntlm,cache=strict,username=valentin.hilbig,uid=11XXXXXXX,forceuid,gid=5XXXX,forcegid,file_mode=0700,dir_mode=0700,nocase,nounix,noserverino,nobrl,nomapposix,rsize=16384,wsize=65216,actimeo=1,domain=MYDOMAIN

$ uname -a
Linux HOSTNAME 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:47 UTC 
2016 i686 i686 i686 GNU/Linux

 From /proc/mounts:
//XXXXXXX/XXXXX /mnt/valentin.hilbig/XXXXXXX/XXXXX cifs 
rw,relatime,vers=1.0,sec=ntlmssp,cache=strict,username=valentin.hilbig,domain=MYDOMAIN,uid=11XXXXXXX,forceuid,gid=5XXXX,forcegid,addr=10.XX.XX.XX,file_mode=0700,dir_mode=0700,nocase,nounix,nobrl,rsize=16384,wsize=65216,actimeo=1 
0 0

In the next days I will re-test with "hard" and try to use something 
else than "version=1.0" and will report again.  But it is very likely 
that both options need to stay as is in our environment.

Regards,
-Tino
PS: Some machines use Kernel 4.4 instead.  Always 32 bit, but I doubt 
that 64 bit makes any difference.


Am 20.01.2017 22:30, schrieb Steve French:
> A couple quick questions:
> 1) I would not expect "hard" vs "soft" mount option makes no
> difference here, but just doublechecking
> 2) How does smb2 reconnect behave in the same scenario (because we
> prefer smb3 to be used if the server is non-Samba)?
>
> Looks like a fix is doable - see line 1464-1465 of fs/cifs/sess.c
>
>      while (sess_data->func)
>          sess_data->func(sess_data);
>
> looking at cifs_reconnect in the case where the ip address is not
> available we wait 3 seconds (if needed to retry), and when that
> succeeds we schedule delayed work to issue an "echo" (see
> cifs_reconnect) and then as we do cifs_reconnect_tcon we could wait up
> to 10 seconds at a time for the socket to come back. If socket is ok
> we do a negotiate protocol which is not necessarily retried on failure
> (depending on the request it can return EAGAIN - e.g.
> read/write/lock/close).  If the negprot succeeds we get to your case
> where we call cifs_setup_session in fs/cifs/connect.c which calls
> CIFS_SessSetup (in fs/cifs/sess.c) which looks like it will loop on
> the sessionsetup retry for the cifs case - which should as you note
> rate limit (especially on bad password case).
>
> I also would like Sachin's feedback as he made some significant
> cleanup of session establishment for cifs and rewrote this - wanted to
> see if he wanted to move the throttling of retries differently
>
> On Thu, Jan 19, 2017 at 1:48 AM, Valentin Hilbig
> <externer.dl.hilbig-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org> wrote:
>> Hello Linux Kernel CIFS-List,
>>
>> please forgive me to ninja-register to the list and start my firstpost right
>> with the questions.  This is done in the hope to save your time. The long
>> background story is below in case you are interested:
>>
>> Q1) Is it possible on the CIFS client to implement caching for failed
>> CIFS/SMB authentication replies?  My wish is to cache those negative replies
>> just a second (HZ), as 3600 retries per hour to re-establish a lost
>> connection to a CIFS server seems enough.  Enough to succeed and enough on
>> semi-permanent failures.  I'd like to see this 1000ms cache as a mount
>> default, as it's not for the initial request, just for the subsequent
>> retries, but setting it to 0 (no cache) is ok for me, too, as it then can be
>> changed at mount-time.
>>
>> Q2) As an extension I also would like to see something like a maximum retry
>> counter, which declares a CIFS mount dead if we do not succeed after N
>> negative replies.  In my case N=40000 (around at least 11 hrs for 1s cache
>> time) sounds good.  However the rate-limiting is much more important than
>> deactivating a rogue CIFS mount.  Hence mount's default should be N=0, which
>> means, infinite retries (as it is today).
>>
>> Q3) According to
>> https://www.kernel.org/doc/readme/Documentation-filesystems-cifs-README
>> these features do not exist (yet).  Are such features planned for the kernel
>> CIFS client module?  If not, is there a chance for me to get patches
>> upstream in case that I provide them?  Is there more to think of than to
>> just follow the style guide (and provide kernel-grade code)?  Of course I
>> will extend the sysctl/proc interface to those new mount options in a
>> compatible way (or discuss this with the list before I break heritage).
>> However my patches will be for "our" kernels used here (3.13 and 4.4), so
>> perhaps this needs some porting/upgrading for the latest (I am not sure that
>> I get permission to take the time to provide patches to the current kernel
>> as well).
>>
>> Sorry if some of those are FAQ, but as gmane.org is down/blank currently, I
>> do not have access to the archive of kernel.cifs.
>>
>> If you some better ideas, please feel free to criticize me ;)
>>
>> Thanks,
>> -Tino
>> PS: FYI full long (sorry!) details follow in case you are interested:
>>
>> (Sorry for missing logs and plain prose, I have no access to the test
>> installation ATM, because it belongs to another group.)
>>
>> Here at LiMux (Linux for Munich) in certain situations (for example the user
>> has changed the password in LDAP) we observe, that CIFS clients might send
>> 30 or more failing CIFS-setup-requests per second(!) to the CIFS server for
>> an existing (old) CIFS-mount.  Each of this requests tries to
>> (re-)authenticate against AD/LDAP but fails, because the credentials are no
>> more valid.  After a short while the brute force protection of the AD kicks
>> in and then blocks the AD-client (in this case the CIFS server) from
>> accessing AD (for a while).  Which means, other clients are affected by the
>> faulty CIFS-mounts and prohibited to authenticate against the CIFS server.
>>
>> The CIFS-Server-people cannot help, as the CIFS' vendor (no, not Microsoft)
>> tells us to switch off brute-force-protection on AD-side, which is something
>> we do not want to do for obvious reasons.  The AD shall continue to block
>> IPs with too many wrong requests.  So the only option we have is, to do
>> something against the high rate of AD-requests with a wrong password coming
>> from CIFS clients.
>>
>> To observe the effect following must happen:
>>
>> - There is an old CIFS mount (for example a User's $HOME), which is already
>> successfully mounted and working.
>>
>> - The TCP session to the CIFS server breaks (like inactivity or some short
>> outage on the network.  I used "tcpkill" to simulate that), such that the
>> Kernel's CIFS module needs to re-establish a connection to the CIFS server
>> for the next access, which then triggers re-authenticating with the stored
>> credentials.
>>
>> - This re-authentication fails, due to a password change or locked account
>> on the AD side.  (If it succeeds there will be no problem, as then the CIFS
>> mount is back to fully functional.  The problem starts, when this
>> re-authentication does not work.)
>>
>> - And there also must be some culprit, in my case some user process (we
>> haven't identified it yet but think it's something like Thunderbird), which
>> tries to access the CIFS share in some looping fashion.  (I used "while
>> sleep 0.1; do touch /path/to/share/FILE; done" to test it.)
>>
>> Please note that there are too many possible user space applications out
>> there which could rapidly hammer a defunct CIFS mount, such that you won't
>> be able to fix them all.  Hence we need a fix on some other level.
>>
>> (BTW we use version=1 of the protocol, and we require it, upgrading 18k of
>> Linux workstations plus infrastructure against politics ain't easy.)
>>
>> The CIFS module just forwards the request(s) to the CIFS server, and, as the
>> TCP-connection is broken, tries to establish a new one.  This triggers
>> authentication, but the authentication fails.  So the CIFS-client sees a
>> negative reply like NT ACCOUNT LOCKED OUT, and answers something like
>> "permission denied" to the userspace.  So far, so correct, everything works
>> perfectly as it should!
>>
>> The problem starts when some userspace application starts to loop over the
>> fault, thereby accessing the CIFS share over and over again, several times a
>> second.  Then the CIFS module continues to do it's job, but it does it much
>> too perfect.  Each single userspace access will try to re-open the session
>> to the CIFS server, again and again, which means we see a massive amount of
>> authentication requests to the server which all are doomed.  Even worse, the
>> faster the server and the better the network, the more such failing requests
>> you will see, of course.  This triggers the AD brute force protection even
>> faster.
>>
>> However, if those few CIFS-clients, which "freak out", would be limited to
>> only send 1 request per second, then AD does not see too many failed
>> requests per timespan, so everything stays operable.
>>
>> But even if this is implemented, this is only half of the story (the
>> important half, but there is more to it):
>>
>> If we had rate-limiting in place the AD and CIFS server are out of the loop.
>> But we still have the user account locked by the failing AD requests.  Let's
>> start over the case from the beginning under the assumption, that we have
>> failed authentication reply caching with a 1s retry:
>>
>> - The user changes his password (perhaps using Windows, not Linux) but does
>> not log out afterwards (on Linux).
>>
>> - The TCP-session of the CIFS mount breaks for some reason.
>>
>> - Some userspace process tries to access this CIFS mount in the looping
>> fashion.
>>
>> - The Kernel's CIFS-module tries to re-establish the connection.
>>
>> - The requests fails due to old credential. (As above.  Windows has the new
>> password, but Linux not.)
>>
>> - After 5 such false retries (seen from the CIFS-Server) the AD locks the
>> account.  Now the Linux-Client sees NT ACCOUNT LOCKED (sp?).  This takes 5
>> seconds.
>>
>> - If the user comes back to work the next day and tries to login, his
>> account is locked, of course.
>>
>> - He calls Help Desk to get his account unlocked.  They do it.
>>
>> - But 5s later his account is locked, again.  Thanks to 5 retries seen from
>> the old login on the Linux client.
>>
>> - Wash, rinse, repeat.
>>
>> Eventually the user finds out where he is still logged in and logs out, such
>> that (in our case) the (automated, yet no more working) user's CIFS-mounts
>> vanish, too.  This delays how long it takes until the user can work
>> normally, also it usually involves a lot of effort of other people to solve
>> the riddle where the login hides.
>>
>> This is why I asked Q2 which would allow us to configure, that after 11
>> hours (or so) the CIFS mount ceases to exist, such that the CIFS client
>> stops trying to re-establish the connection.  Which means, the next business
>> day, the CIFS mount very likely has invalidated (it still is mounted, but
>> quiet on the Linux side), such that the user can have his password unlocked
>> without trouble.
>>
>> This is a tripple-win situation, as it not only helps the Users and takes
>> the burden from Help Desk to diagnose a hard do diagnose situation, it also
>> conserves some wasted network bandwidth and processing power due to all
>> those fruitless authentication requests seen today.  Sigh.
>>
>> I agree that all this is not the fault of the CIFS module.  However it is
>> better to start to be nice and polite to the infrastructure in case
>> something stupid happens, than to continue as usual and thereby wasting
>> resources and possibly impact others, even when you are rightfully doing
>> this.
>>
>> (This is a technical list, so I do not introduce myself, because I am not
>> important.  All you need to know is that I know Linux from 0.99 and I am
>> able to hack the kernel, but until now only for my very own needs.  BTW, my
>> private GitHub is https://github.com/hilbix/)
>>
>> Thanks for any help or comments,
>>
>> -Tino
>>
>> --
>> Mit freundlichen Grüßen
>> Valentin Hilbig
>> Externer Dienstleister
>>
>> IT@M - Dienstleister für Informations- und Telekommunikationstechnik der
>> Landeshauptstadt München
>> Geschäftsbereich Werkzeuge und Infrastruktur
>> Servicebereich Städtische Arbeitsplätze
>> Serviceteam LiMux-Arbeitsplatz I23
>> LiMux-Basisclient
>>
>> Raum A2.030, Agnes-Pockels-Bogen 21, 80992 München
>>
>> Tel.: +49 89 233-782273
>> E-Mail: externer.dl.hilbig-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-28 16:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-19  7:48 Linux CIFS client module: login rate limiting Valentin Hilbig
     [not found] ` <58806F39.9010801-EnyPcy3oyxIb1SvskN2V4Q@public.gmane.org>
2017-01-20 21:30   ` Steve French
     [not found]     ` <CAH2r5mtrOqucTBXE3Ni02gWGVBG+o-EbgdVarL1xZjWv0S2xyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-23 10:57       ` Sachin Prabhu
     [not found]         ` <1485169046.17488.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-01-24  9:57           ` Sachin Prabhu
     [not found]             ` <1485251879.17488.14.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-28 16:34               ` Valentin Hilbig
2017-01-23 12:13       ` Valentin Hilbig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).