All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: multiple servers per automount
@ 2003-10-10 15:16 Ogden, Aaron A.
  2003-10-13  3:23 ` [NFS] " Ian Kent
  0 siblings, 1 reply; 23+ messages in thread
From: Ogden, Aaron A. @ 2003-10-10 15:16 UTC (permalink / raw)
  To: Ian Kent, Mike Waychison; +Cc: autofs mailing list, nfs



-----Original Message-----
From: Ian Kent [mailto:raven@themaw.net] 
Sent: Thursday, October 09, 2003 8:09 PM
To: Mike Waychison
Cc: Ogden, Aaron A.; autofs mailing list; nfs@lists.sourceforge.net
Subject: Re: [autofs] multiple servers per automount

>> The maximum number of plain pseudo-block device filesystems on a
given
>> filesystem is limitted to 256. (This includes proc, autofs, nfs..).
>>
>> This is because pseudo-block filesystems all use major 0, and each
have
>> a different minor (thus the 256 limit).
>>
>> There are however patches floating around (look at SuSe's kernels,
I'm
>> not sure about RH) that allow n majors to be used (default 5).  This
>> gives you 1280 mounts, a big step up :)
>>
>
> But as Aaron and I know things go pear shaped at just shy of 800
mounts
> with RedHat kernels. They have the more-unnamed patch.
>
> So this would indicate that even if there is a device system that can
> increase the number of unnamed devices that subsystems like NFS cannot
> handle this many mounts.

Maybe.  I'm not 100% certain though.  Currently I am holding steady at
710 active mounts, I am going to write a little script to mount more in
small increments, ie. read a list of ~1000 mountpoints from /home, mount
a few of them, check the filesystems, and repeat... this way I will know
exactly where things break down.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [NFS] Re: multiple servers per automount
@ 2003-10-10 15:43 Ogden, Aaron A.
  2003-10-10 15:54 ` Mike Waychison
  0 siblings, 1 reply; 23+ messages in thread
From: Ogden, Aaron A. @ 2003-10-10 15:43 UTC (permalink / raw)
  To: Lever, Charles, Ian Kent, Mike Waychison; +Cc: autofs mailing list, nfs


Aha! Wisdom from the heavens... :-)
I assume that the RPC code is doing that to comply with reserved-port
restrictions, ie. ports < 1024.  Solaris needs to do the same thing
(with nfssrv:nfs_portmon=1) so it seems that there would be an inherent
limit of 1024 ports or mountpoints to work with.  Actually less, since
some ports will be in use.  How does Sun get 260,000 active mounts if
they can only use ports < 1024?  Do we really need one port for each
mountpoint?

Perhaps this has something to do with the fact that solaris autofs is
multithreaded (ie. one process) whereas linux autofs has many processes,
one for each mountpoint.  Feel free to correct me if I'm wrong...

-A

-----Original Message-----
From: Lever, Charles [mailto:Charles.Lever@netapp.com] 
Sent: Friday, October 10, 2003 10:10 AM
To: Ian Kent; Mike Waychison
Cc: Ogden, Aaron A.; autofs mailing list; nfs@lists.sourceforge.net
Subject: RE: [NFS] Re: [autofs] multiple servers per automount


the problem is likely the algorithm used to allocate
ports for the RPC transport sockets.  it starts at
port 800 and goes down to zero.

> -----Original Message-----
> From: Ian Kent [mailto:raven@themaw.net]
> Sent: Thursday, October 09, 2003 6:09 PM
> To: Mike Waychison
> Cc: Ogden, Aaron A.; autofs mailing list; nfs@lists.sourceforge.net
> Subject: [NFS] Re: [autofs] multiple servers per automount
> 
> 
> On Thu, 9 Oct 2003, Mike Waychison wrote:
> 
> > Ogden, Aaron A. wrote:
> >
> > >Ouch.  As you may know, the limit is *much* lower in linux.
Something
> > >that I've been struggling with recently...
> > >
> > >Under normal circumstances I would not be concerned with
'limitations'
> > >of a few hundred active NFS mounts, but such limitations certainly
limit
> > >scalability for the extreme cases.
> > >
> > >
> >
> > The maximum number of plain pseudo-block device filesystems on a
given
> > filesystem is limitted to 256. (This includes proc, autofs, nfs..).
> >
> > This is because pseudo-block filesystems all use major 0, and each
have
> > a different minor (thus the 256 limit).
> >
> > There are however patches floating around (look at SuSe's kernels,
I'm
> > not sure about RH) that allow n majors to be used (default 5).  This
> > gives you 1280 mounts, a big step up :)
> >
> 
> But as Aaron and I know things go pear shaped at just shy of 800
mounts
> with RedHat kernels. They have the more-unnamed patch.
> 
> So this would indicate that even if there is a device system that can
> increase the number of unnamed devices that subsystems like NFS cannot
> handle this many mounts.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] Re: multiple servers per automount
  2003-10-10 15:43 Ogden, Aaron A.
@ 2003-10-10 15:54 ` Mike Waychison
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Waychison @ 2003-10-10 15:54 UTC (permalink / raw)
  To: Ogden, Aaron A.; +Cc: autofs mailing list, nfs, Lever, Charles, Ian Kent

Ogden, Aaron A. wrote:

>Aha! Wisdom from the heavens... :-)
>I assume that the RPC code is doing that to comply with reserved-port
>restrictions, ie. ports < 1024.  Solaris needs to do the same thing
>(with nfssrv:nfs_portmon=1) so it seems that there would be an inherent
>limit of 1024 ports or mountpoints to work with.  Actually less, since
>some ports will be in use.  How does Sun get 260,000 active mounts if
>they can only use ports < 1024?  Do we really need one port for each
>mountpoint?
>  
>


Don't take my word for it, because I don't know any better..   But 
Solaris may multiplex different NFS servers on the same udp port.  They 
may also have their tests done with TCP instead of udp, which solves 
that problem elegantly.

>Perhaps this has something to do with the fact that solaris autofs is
>multithreaded (ie. one process) whereas linux autofs has many processes,
>one for each mountpoint.  Feel free to correct me if I'm wrong...
>  
>

Nah, this sounds alot like an NFS issue.  See Charles Lever's post.

Mike Waychison

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] Re: multiple servers per automount
@ 2003-10-10 17:02 Eric Werme USG
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Werme USG @ 2003-10-10 17:02 UTC (permalink / raw)
  To: aogden; +Cc: autofs

Ogden, Aaron A. wrote:

>Aha! Wisdom from the heavens... :-)
>I assume that the RPC code is doing that to comply with reserved-port
>restrictions, ie. ports < 1024.  Solaris needs to do the same thing
>(with nfssrv:nfs_portmon=1) so it seems that there would be an inherent
>limit of 1024 ports or mountpoints to work with.  Actually less, since
>some ports will be in use.  How does Sun get 260,000 active mounts if
>they can only use ports < 1024?  Do we really need one port for each
>mountpoint?

I can't speak for Solaris, but on HP's Tru64 UNIX we use one TCP
connection for all traffic per mount, and we close connections that
have been idle for 5 minutes and when there are "too many" connections
to one server.  For UDP, the NFS client uses a single port, in large part
do to problems with port number space exhaustion and the ripple effects
on other consumers of that space.  (We don't throttle the number of outstanding
NFS requests, but we have a fixed limit on the read/write nfsiod helper
threads.)  We generally ran into port number exhaustion on our mail server
which uses NFS (via aoutmount) to access /home/user/.forward files.  If one
production system went down, then the mail server would wind up with a 
big flock of sendmails all trying to access the .forwards until the port
number space was chewed up, then automount couldn't issue new mounts
whereupon no mail got delivered to anyone.

The NFS client gets its first look at a reply via a callback from UDP
code when it finds the port has been registered.  The callback figures
out what thread is waiting for the XID, saves the reply address in a
data structure and issues the wakeup.  When the code is processed for
real, it's NFS code that does the UDP checksum, thereby loading
the local cache with the data.  The inspiration was pretty simple as I
had to do the same demultiplexing in the NFS over TCP client.

BTW, the rationale behind the one TCP connection per mount was to
conform to TCP's congestion control design, but limit the amount of
cross mount locking and code complexity.  Typical NFS traffic
has multiple accesses on a mount at a time, so I figured it would be
a good compromise.  I know Solaris has one connection per server, I don't
know what other vendors do.

	-Ric Werme

-- 
Eric (Ric) Werme         |  werme@zk3.dec.com
Hewlett-Packard Co.      |  http://werme.8m.net/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [NFS] Re: multiple servers per automount
  2003-10-10 15:10 Re: [autofs] " Lever, Charles
@ 2003-10-13  3:05 ` Ian Kent
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Kent @ 2003-10-13  3:05 UTC (permalink / raw)
  To: Lever, Charles; +Cc: Ogden, Aaron A., autofs mailing list, Mike Waychison, nfs

On Fri, 10 Oct 2003, Lever, Charles wrote:

> the problem is likely the algorithm used to allocate
> ports for the RPC transport sockets.  it starts at
> port 800 and goes down to zero.

Don't think so.

I appears that a single connection is maintained for nfs comms for both
udp and tcp.

However, if a rapid number of mount requests are fired then multiple
portmap connections are made. They end up in a TIME_WAIT state which is
probably causing the port allocation starvation.

This doesn't appear to happen under Solaris.

>
> > -----Original Message-----
> > From: Ian Kent [mailto:raven@themaw.net]
> > Sent: Thursday, October 09, 2003 6:09 PM
> > To: Mike Waychison
> > Cc: Ogden, Aaron A.; autofs mailing list; nfs@lists.sourceforge.net
> > Subject: [NFS] Re: [autofs] multiple servers per automount
> >
> >
> > On Thu, 9 Oct 2003, Mike Waychison wrote:
> >
> > > Ogden, Aaron A. wrote:
> > >
> > > >Ouch.  As you may know, the limit is *much* lower in
> > linux.  Something
> > > >that I've been struggling with recently...
> > > >
> > > >Under normal circumstances I would not be concerned with
> > 'limitations'
> > > >of a few hundred active NFS mounts, but such limitations
> > certainly limit
> > > >scalability for the extreme cases.
> > > >
> > > >
> > >
> > > The maximum number of plain pseudo-block device filesystems
> > on a given
> > > filesystem is limitted to 256. (This includes proc, autofs, nfs..).
> > >
> > > This is because pseudo-block filesystems all use major 0,
> > and each have
> > > a different minor (thus the 256 limit).
> > >
> > > There are however patches floating around (look at SuSe's
> > kernels, I'm
> > > not sure about RH) that allow n majors to be used (default 5).  This
> > > gives you 1280 mounts, a big step up :)
> > >
> >
> > But as Aaron and I know things go pear shaped at just shy of
> > 800 mounts
> > with RedHat kernels. They have the more-unnamed patch.
> >
> > So this would indicate that even if there is a device system that can
> > increase the number of unnamed devices that subsystems like NFS cannot
> > handle this many mounts.
> >
> > --
> >
> >    ,-._|\    Ian Kent
> >   /      \   Perth, Western Australia
> >   *_.--._/   E-mail: raven@themaw.net
> >         v    Web: http://themaw.net/
> >
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: SF.net Giveback Program.
> > SourceForge.net hosts over 70,000 Open Source Projects.
> > See the people who have HELPED US provide better services:
> > Click here: http://sourceforge.net/supporters.php
> > _______________________________________________
> > NFS maillist  -  NFS@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs
> >
>

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: multiple servers per automount
  2003-10-10 15:16 multiple servers per automount Ogden, Aaron A.
@ 2003-10-13  3:23 ` Ian Kent
  2003-10-14  7:05   ` Joseph V Moss
  0 siblings, 1 reply; 23+ messages in thread
From: Ian Kent @ 2003-10-13  3:23 UTC (permalink / raw)
  To: Ogden, Aaron A.; +Cc: autofs mailing list, nfs, Mike Waychison

On Fri, 10 Oct 2003, Ogden, Aaron A. wrote:

>
>
> > So this would indicate that even if there is a device system that can
> > increase the number of unnamed devices that subsystems like NFS cannot
> > handle this many mounts.
>
> Maybe.  I'm not 100% certain though.  Currently I am holding steady at
> 710 active mounts, I am going to write a little script to mount more in
> small increments, ie. read a list of ~1000 mountpoints from /home, mount
> a few of them, check the filesystems, and repeat... this way I will know
> exactly where things break down.

Interesting.

If you can edge it up then it's probably not an available port
restriction.

There may be more than one issue at work here.

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: multiple servers per automount
  2003-10-13  3:23 ` [NFS] " Ian Kent
@ 2003-10-14  7:05   ` Joseph V Moss
  2003-10-14 13:37       ` Ian Kent
  0 siblings, 1 reply; 23+ messages in thread
From: Joseph V Moss @ 2003-10-14  7:05 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ogden, Aaron A., autofs mailing list, nfs, Mike Waychison

> On Fri, 10 Oct 2003, Ogden, Aaron A. wrote:
> 
> >
> >
> > > So this would indicate that even if there is a device system that can
> > > increase the number of unnamed devices that subsystems like NFS cannot
> > > handle this many mounts.
> >
> > Maybe.  I'm not 100% certain though.  Currently I am holding steady at
> > 710 active mounts, I am going to write a little script to mount more in
> > small increments, ie. read a list of ~1000 mountpoints from /home, mount
> > a few of them, check the filesystems, and repeat... this way I will know
> > exactly where things break down.
> 
> Interesting.
> 
> If you can edge it up then it's probably not an available port
> restriction.
> 
> There may be more than one issue at work here.
> 

The limit is 800 as others have stated.  Although, it can be less than that
if something else is already using up some of the reserved UDP ports.

I wrote a patch long ago against a 2.2.x kernel to enable it to use
multiple majors for NFS mounts (like the patches now common in several
distros).  I then ran into the 800 limit in the RPC layer.  After changing
the RPC layer to count up from 0, instead of down from 800, with no real
upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
I'm sure I could have done many thousand if I had had that many filesystems
around to mount.  Obviously, after 1024, it wasn't using reserved ports
anymore, but it didn't seem to matter.

Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
the RPC layer is different enough between 2.2 and 2.4 that it didn't work
right off.  Bumping it up to somewhere around 1024 should work, but using
non-reserved ports didn't seem to work when I made a simple attempt.

Of course, the real fix for the NFS layer is the expansion of the minor
numbers that's already occurred in 2.6 and the RPC layer problems should
be fixed by multiplexing multiple mounts on the same port.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RE: [autofs] multiple servers per automount
  2003-10-14  7:05   ` Joseph V Moss
@ 2003-10-14 13:37       ` Ian Kent
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Kent @ 2003-10-14 13:37 UTC (permalink / raw)
  To: Joseph V Moss; +Cc: Ogden, Aaron A., autofs mailing list, nfs, Mike Waychison

On Tue, 14 Oct 2003, Joseph V Moss wrote:

> The limit is 800 as others have stated.  Although, it can be less than that
> if something else is already using up some of the reserved UDP ports.
> 
> I wrote a patch long ago against a 2.2.x kernel to enable it to use
> multiple majors for NFS mounts (like the patches now common in several
> distros).  I then ran into the 800 limit in the RPC layer.  After changing
> the RPC layer to count up from 0, instead of down from 800, with no real
> upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> I'm sure I could have done many thousand if I had had that many filesystems
> around to mount.  Obviously, after 1024, it wasn't using reserved ports
> anymore, but it didn't seem to matter.
> 
> Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> right off.  Bumping it up to somewhere around 1024 should work, but using
> non-reserved ports didn't seem to work when I made a simple attempt.
> 
> Of course, the real fix for the NFS layer is the expansion of the minor
> numbers that's already occurred in 2.6 and the RPC layer problems should
> be fixed by multiplexing multiple mounts on the same port.
> 
> 

I don't see that expansion in 2.6 (test6). It looks to me like the 
allocation is done in set_anon_super (in fs/super.c) and that looks like 
it is restricted to 256. Please correct this for me. I can't see how there 
is any change to the number of unnmaed devices.

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RE: [autofs] multiple servers per automount
@ 2003-10-14 13:37       ` Ian Kent
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Kent @ 2003-10-14 13:37 UTC (permalink / raw)
  To: Joseph V Moss; +Cc: Ogden, Aaron A., autofs mailing list, nfs, Mike Waychison

On Tue, 14 Oct 2003, Joseph V Moss wrote:

> The limit is 800 as others have stated.  Although, it can be less than that
> if something else is already using up some of the reserved UDP ports.
> 
> I wrote a patch long ago against a 2.2.x kernel to enable it to use
> multiple majors for NFS mounts (like the patches now common in several
> distros).  I then ran into the 800 limit in the RPC layer.  After changing
> the RPC layer to count up from 0, instead of down from 800, with no real
> upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> I'm sure I could have done many thousand if I had had that many filesystems
> around to mount.  Obviously, after 1024, it wasn't using reserved ports
> anymore, but it didn't seem to matter.
> 
> Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> right off.  Bumping it up to somewhere around 1024 should work, but using
> non-reserved ports didn't seem to work when I made a simple attempt.
> 
> Of course, the real fix for the NFS layer is the expansion of the minor
> numbers that's already occurred in 2.6 and the RPC layer problems should
> be fixed by multiplexing multiple mounts on the same port.
> 
> 

I don't see that expansion in 2.6 (test6). It looks to me like the 
allocation is done in set_anon_super (in fs/super.c) and that looks like 
it is restricted to 256. Please correct this for me. I can't see how there 
is any change to the number of unnmaed devices.

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: multiple servers per automount
  2003-10-14 13:37       ` Ian Kent
@ 2003-10-14 15:52         ` Mike Waychison
  -1 siblings, 0 replies; 23+ messages in thread
From: Mike Waychison @ 2003-10-14 15:52 UTC (permalink / raw)
  To: Ian Kent
  Cc: Ogden, Aaron A., autofs mailing list, nfs, Kernel Mailing List,
	Joseph V Moss

Ian Kent wrote:

>On Tue, 14 Oct 2003, Joseph V Moss wrote:
>
>  
>
>>The limit is 800 as others have stated.  Although, it can be less than that
>>if something else is already using up some of the reserved UDP ports.
>>
>>I wrote a patch long ago against a 2.2.x kernel to enable it to use
>>multiple majors for NFS mounts (like the patches now common in several
>>distros).  I then ran into the 800 limit in the RPC layer.  After changing
>>the RPC layer to count up from 0, instead of down from 800, with no real
>>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
>>I'm sure I could have done many thousand if I had had that many filesystems
>>around to mount.  Obviously, after 1024, it wasn't using reserved ports
>>anymore, but it didn't seem to matter.
>>
>>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
>>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
>>right off.  Bumping it up to somewhere around 1024 should work, but using
>>non-reserved ports didn't seem to work when I made a simple attempt.
>>
>>Of course, the real fix for the NFS layer is the expansion of the minor
>>numbers that's already occurred in 2.6 and the RPC layer problems should
>>be fixed by multiplexing multiple mounts on the same port.
>>
>>
>>    
>>
>
>I don't see that expansion in 2.6 (test6). It looks to me like the 
>allocation is done in set_anon_super (in fs/super.c) and that looks like 
>it is restricted to 256. Please correct this for me. I can't see how there 
>is any change to the number of unnmaed devices.
>
>  
>

Here is the quick fix for this in RH 2.1AS kernels:

http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch

It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0. 

I don't know if anyone is working out a better scheme for 
get_unnamed_dev in 2.6 yet.  It does need to be done though.  A simple 
patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to 
PAGE_SIZE, automatically allowing for 32768 unnamed devices.

Mike Waychison

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
@ 2003-10-14 15:52         ` Mike Waychison
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Waychison @ 2003-10-14 15:52 UTC (permalink / raw)
  To: Ian Kent
  Cc: Joseph V Moss, Ogden, Aaron A., autofs mailing list, nfs,
	Kernel Mailing List

Ian Kent wrote:

>On Tue, 14 Oct 2003, Joseph V Moss wrote:
>
>  
>
>>The limit is 800 as others have stated.  Although, it can be less than that
>>if something else is already using up some of the reserved UDP ports.
>>
>>I wrote a patch long ago against a 2.2.x kernel to enable it to use
>>multiple majors for NFS mounts (like the patches now common in several
>>distros).  I then ran into the 800 limit in the RPC layer.  After changing
>>the RPC layer to count up from 0, instead of down from 800, with no real
>>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
>>I'm sure I could have done many thousand if I had had that many filesystems
>>around to mount.  Obviously, after 1024, it wasn't using reserved ports
>>anymore, but it didn't seem to matter.
>>
>>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
>>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
>>right off.  Bumping it up to somewhere around 1024 should work, but using
>>non-reserved ports didn't seem to work when I made a simple attempt.
>>
>>Of course, the real fix for the NFS layer is the expansion of the minor
>>numbers that's already occurred in 2.6 and the RPC layer problems should
>>be fixed by multiplexing multiple mounts on the same port.
>>
>>
>>    
>>
>
>I don't see that expansion in 2.6 (test6). It looks to me like the 
>allocation is done in set_anon_super (in fs/super.c) and that looks like 
>it is restricted to 256. Please correct this for me. I can't see how there 
>is any change to the number of unnmaed devices.
>
>  
>

Here is the quick fix for this in RH 2.1AS kernels:

http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch

It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0. 

I don't know if anyone is working out a better scheme for 
get_unnamed_dev in 2.6 yet.  It does need to be done though.  A simple 
patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to 
PAGE_SIZE, automatically allowing for 32768 unnamed devices.

Mike Waychison


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-14 15:52         ` [NFS] RE: [autofs] " Mike Waychison
  (?)
@ 2003-10-14 20:44         ` H. Peter Anvin
  2003-10-14 23:12           ` Mike Waychison
  -1 siblings, 1 reply; 23+ messages in thread
From: H. Peter Anvin @ 2003-10-14 20:44 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <3F8C1BB6.9010202@sun.com>
By author:    Mike Waychison <Michael.Waychison@Sun.COM>
In newsgroup: linux.dev.kernel
> 
> Here is the quick fix for this in RH 2.1AS kernels:
> 
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
> 
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0. 
> 
> I don't know if anyone is working out a better scheme for 
> get_unnamed_dev in 2.6 yet.  It does need to be done though.  A simple 
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to 
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
> 

dev_t enlargement, which solves this without a bunch of auxilliary
majors, should be in 2.6.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-14 20:44         ` H. Peter Anvin
@ 2003-10-14 23:12           ` Mike Waychison
  2003-10-15 10:28             ` Ingo Oeser
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Waychison @ 2003-10-14 23:12 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, Ian Kent

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

H. Peter Anvin wrote:
> Followup to:  <3F8C1BB6.9010202@sun.com>
> By author:    Mike Waychison <Michael.Waychison@Sun.COM>
> In newsgroup: linux.dev.kernel
> 
>>Here is the quick fix for this in RH 2.1AS kernels:
>>
>>http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>>
>>It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0. 
>>
>>I don't know if anyone is working out a better scheme for 
>>get_unnamed_dev in 2.6 yet.  It does need to be done though.  A simple 
>>patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to 
>>PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>>
> 
> 
> dev_t enlargement, which solves this without a bunch of auxilliary
> majors, should be in 2.6.
> 
> 	-hpa

The problem still remains in 2.6 that we limit the count to 256.  I've 
attached a quick patch that I've compiled and tested.  I don't know if 
there is a better way to handle dynamic assignment of minors (haven't 
kept up to date in that realm), but if there is, then we should probably 
  use it instead.

Mike Waychison

[-- Attachment #2: max_anon.patch --]
[-- Type: text/plain, Size: 881 bytes --]

===== fs/super.c 1.108 vs edited =====
--- 1.108/fs/super.c	Wed Oct  1 15:36:45 2003
+++ edited/fs/super.c	Tue Oct 14 22:52:12 2003
@@ -528,14 +528,22 @@
  * filesystems which don't use real block-devices.  -- jrs
  */
 
-enum {Max_anon = 256};
-static unsigned long unnamed_dev_in_use[Max_anon/(8*sizeof(unsigned long))];
+enum {Max_anon = PAGE_SIZE * 8};
+static void *unnamed_dev_in_use = NULL;
 static spinlock_t unnamed_dev_lock = SPIN_LOCK_UNLOCKED;/* protects the above */
 
 int set_anon_super(struct super_block *s, void *data)
 {
 	int dev;
 	spin_lock(&unnamed_dev_lock);
+
+	if (!unnamed_dev_in_use)
+		unnamed_dev_in_use = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!unnamed_dev_in_use) {
+		spin_unlock(&unnamed_dev_lock);
+		return -ENOMEM;
+	}
+
 	dev = find_first_zero_bit(unnamed_dev_in_use, Max_anon);
 	if (dev == Max_anon) {
 		spin_unlock(&unnamed_dev_lock);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RE: [autofs] multiple servers per automount
  2003-10-14 15:52         ` [NFS] RE: [autofs] " Mike Waychison
  (?)
@ 2003-10-15  7:22           ` Ian Kent
  -1 siblings, 0 replies; 23+ messages in thread
From: Ian Kent @ 2003-10-15  7:22 UTC (permalink / raw)
  To: Mike Waychison
  Cc: Joseph V Moss, Ogden, Aaron A., autofs mailing list, nfs,
	Kernel Mailing List

On Tue, 14 Oct 2003, Mike Waychison wrote:

> Ian Kent wrote:
>
> >On Tue, 14 Oct 2003, Joseph V Moss wrote:
> >
> >
> >
> >>The limit is 800 as others have stated.  Although, it can be less than that
> >>if something else is already using up some of the reserved UDP ports.
> >>
> >>I wrote a patch long ago against a 2.2.x kernel to enable it to use
> >>multiple majors for NFS mounts (like the patches now common in several
> >>distros).  I then ran into the 800 limit in the RPC layer.  After changing
> >>the RPC layer to count up from 0, instead of down from 800, with no real
> >>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> >>I'm sure I could have done many thousand if I had had that many filesystems
> >>around to mount.  Obviously, after 1024, it wasn't using reserved ports
> >>anymore, but it didn't seem to matter.
> >>
> >>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> >>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> >>right off.  Bumping it up to somewhere around 1024 should work, but using
> >>non-reserved ports didn't seem to work when I made a simple attempt.
> >>
> >>Of course, the real fix for the NFS layer is the expansion of the minor
> >>numbers that's already occurred in 2.6 and the RPC layer problems should
> >>be fixed by multiplexing multiple mounts on the same port.
> >>
> >>
> >>
> >>
> >
> >I don't see that expansion in 2.6 (test6). It looks to me like the
> >allocation is done in set_anon_super (in fs/super.c) and that looks like
> >it is restricted to 256. Please correct this for me. I can't see how there
> >is any change to the number of unnmaed devices.
> >
> >
> >
>
> Here is the quick fix for this in RH 2.1AS kernels:
>
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>
> I don't know if anyone is working out a better scheme for
> get_unnamed_dev in 2.6 yet.  It does need to be done though.  A simple
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>

OK. Sounds like a good job for me to do (simple - maybe).
I'll spend a while looking for possible side effects.

Do you think that the possible NFS port allocation problems should hold up
this work or should it drive updates to NFS?

Comments from anyone about where to check and what to watch out for are
welcome.

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RE: [autofs] multiple servers per automount
@ 2003-10-15  7:22           ` Ian Kent
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Kent @ 2003-10-15  7:22 UTC (permalink / raw)
  To: Mike Waychison
  Cc: Joseph V Moss, Ogden, Aaron A., autofs mailing list, nfs,
	Kernel Mailing List

On Tue, 14 Oct 2003, Mike Waychison wrote:

> Ian Kent wrote:
>
> >On Tue, 14 Oct 2003, Joseph V Moss wrote:
> >
> >
> >
> >>The limit is 800 as others have stated.  Although, it can be less than that
> >>if something else is already using up some of the reserved UDP ports.
> >>
> >>I wrote a patch long ago against a 2.2.x kernel to enable it to use
> >>multiple majors for NFS mounts (like the patches now common in several
> >>distros).  I then ran into the 800 limit in the RPC layer.  After changing
> >>the RPC layer to count up from 0, instead of down from 800, with no real
> >>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> >>I'm sure I could have done many thousand if I had had that many filesystems
> >>around to mount.  Obviously, after 1024, it wasn't using reserved ports
> >>anymore, but it didn't seem to matter.
> >>
> >>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> >>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> >>right off.  Bumping it up to somewhere around 1024 should work, but using
> >>non-reserved ports didn't seem to work when I made a simple attempt.
> >>
> >>Of course, the real fix for the NFS layer is the expansion of the minor
> >>numbers that's already occurred in 2.6 and the RPC layer problems should
> >>be fixed by multiplexing multiple mounts on the same port.
> >>
> >>
> >>
> >>
> >
> >I don't see that expansion in 2.6 (test6). It looks to me like the
> >allocation is done in set_anon_super (in fs/super.c) and that looks like
> >it is restricted to 256. Please correct this for me. I can't see how there
> >is any change to the number of unnmaed devices.
> >
> >
> >
>
> Here is the quick fix for this in RH 2.1AS kernels:
>
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>
> I don't know if anyone is working out a better scheme for
> get_unnamed_dev in 2.6 yet.  It does need to be done though.  A simple
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>

OK. Sounds like a good job for me to do (simple - maybe).
I'll spend a while looking for possible side effects.

Do you think that the possible NFS port allocation problems should hold up
this work or should it drive updates to NFS?

Comments from anyone about where to check and what to watch out for are
welcome.

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
@ 2003-10-15  7:22           ` Ian Kent
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Kent @ 2003-10-15  7:22 UTC (permalink / raw)
  To: Mike Waychison
  Cc: Joseph V Moss, Ogden, Aaron A., autofs mailing list, nfs,
	Kernel Mailing List

On Tue, 14 Oct 2003, Mike Waychison wrote:

> Ian Kent wrote:
>
> >On Tue, 14 Oct 2003, Joseph V Moss wrote:
> >
> >
> >
> >>The limit is 800 as others have stated.  Although, it can be less than that
> >>if something else is already using up some of the reserved UDP ports.
> >>
> >>I wrote a patch long ago against a 2.2.x kernel to enable it to use
> >>multiple majors for NFS mounts (like the patches now common in several
> >>distros).  I then ran into the 800 limit in the RPC layer.  After changing
> >>the RPC layer to count up from 0, instead of down from 800, with no real
> >>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> >>I'm sure I could have done many thousand if I had had that many filesystems
> >>around to mount.  Obviously, after 1024, it wasn't using reserved ports
> >>anymore, but it didn't seem to matter.
> >>
> >>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> >>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> >>right off.  Bumping it up to somewhere around 1024 should work, but using
> >>non-reserved ports didn't seem to work when I made a simple attempt.
> >>
> >>Of course, the real fix for the NFS layer is the expansion of the minor
> >>numbers that's already occurred in 2.6 and the RPC layer problems should
> >>be fixed by multiplexing multiple mounts on the same port.
> >>
> >>
> >>
> >>
> >
> >I don't see that expansion in 2.6 (test6). It looks to me like the
> >allocation is done in set_anon_super (in fs/super.c) and that looks like
> >it is restricted to 256. Please correct this for me. I can't see how there
> >is any change to the number of unnmaed devices.
> >
> >
> >
>
> Here is the quick fix for this in RH 2.1AS kernels:
>
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>
> I don't know if anyone is working out a better scheme for
> get_unnamed_dev in 2.6 yet.  It does need to be done though.  A simple
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>

OK. Sounds like a good job for me to do (simple - maybe).
I'll spend a while looking for possible side effects.

Do you think that the possible NFS port allocation problems should hold up
this work or should it drive updates to NFS?

Comments from anyone about where to check and what to watch out for are
welcome.

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-14 23:12           ` Mike Waychison
@ 2003-10-15 10:28             ` Ingo Oeser
  2003-10-15 16:16               ` Mike Waychison
  2003-10-23 13:37               ` Ian Kent
  0 siblings, 2 replies; 23+ messages in thread
From: Ingo Oeser @ 2003-10-15 10:28 UTC (permalink / raw)
  To: Mike Waychison
  Cc: linux-kernel, Ian Kent, linux-kernel, Ian Kent, linux-kernel,
	Ian Kent, linux-kernel, Ian Kent

On Wednesday 15 October 2003 01:12, Mike Waychison wrote:
> The problem still remains in 2.6 that we limit the count to 256.  I've
> attached a quick patch that I've compiled and tested.  I don't know if
> there is a better way to handle dynamic assignment of minors (haven't
> kept up to date in that realm), but if there is, then we should probably
>   use it instead.


In your patch you allocate inside the spinlock.

I would suggest to do sth. like the following:

void *local;
if (!unamed_dev_inuse) {
    local = get_zeroed_page(GFP_KERNEL);

    if (!local) 
        return -ENOMEM;
}

spinlock(&unamed_dev_lock);
mb();
if (!unamed_dev_inuse) {
    unamed_dev_inuse = local;

    /* Used globally, don't free now */
    local = NULL;
}

/* 
  Do the lookup and alloc
 */

spinunlock(&unamed_dev_lock);

/* Free page, because of race on allocation. */
if (local) 
    free_page(local);


Which will swap the pointers atomically and still alloc outside the
non-sleeping locking.


Regards

Ingo Oeser



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-15 10:28             ` Ingo Oeser
@ 2003-10-15 16:16               ` Mike Waychison
  2003-10-23 13:37               ` Ian Kent
  1 sibling, 0 replies; 23+ messages in thread
From: Mike Waychison @ 2003-10-15 16:16 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Mike Waychison, linux-kernel, Ian Kent

[-- Attachment #1: Type: text/plain, Size: 717 bytes --]

Ingo Oeser wrote:
> On Wednesday 15 October 2003 01:12, Mike Waychison wrote:
> 
>>The problem still remains in 2.6 that we limit the count to 256.  I've
>>attached a quick patch that I've compiled and tested.  I don't know if
>>there is a better way to handle dynamic assignment of minors (haven't
>>kept up to date in that realm), but if there is, then we should probably
>>  use it instead.
> 
> 
> 
> In your patch you allocate inside the spinlock.
> 
> I would suggest to do sth. like the following:
> 

Better yet..  we could move it into an __init section that will panic if 
the allocation fails (this should be the desired behaviour..).  This way 
we don't even have to grab the lock either.

Mike Waychison

[-- Attachment #2: max_anon_2.patch --]
[-- Type: text/plain, Size: 1592 bytes --]

===== fs/namespace.c 1.49 vs edited =====
--- 1.49/fs/namespace.c	Thu Jul 17 22:30:49 2003
+++ edited/fs/namespace.c	Wed Oct 15 15:59:11 2003
@@ -23,6 +23,7 @@
 #include <linux/mount.h>
 #include <asm/uaccess.h>
 
+extern void __init super_init(void);
 extern int __init init_rootfs(void);
 extern int __init sysfs_init(void);
 
@@ -1154,6 +1155,7 @@
 		d++;
 		i--;
 	} while (i);
+	super_init();
 	sysfs_init();
 	init_rootfs();
 	init_mount_tree();
===== fs/super.c 1.108 vs edited =====
--- 1.108/fs/super.c	Wed Oct  1 15:36:45 2003
+++ edited/fs/super.c	Wed Oct 15 15:59:50 2003
@@ -24,6 +24,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/smp_lock.h>
+#include <linux/init.h>
 #include <linux/acct.h>
 #include <linux/blkdev.h>
 #include <linux/quotaops.h>
@@ -527,15 +528,22 @@
  * Unnamed block devices are dummy devices used by virtual
  * filesystems which don't use real block-devices.  -- jrs
  */
-
-enum {Max_anon = 256};
-static unsigned long unnamed_dev_in_use[Max_anon/(8*sizeof(unsigned long))];
+enum {Max_anon = PAGE_SIZE * 8};
+static void *unnamed_dev_in_use;
 static spinlock_t unnamed_dev_lock = SPIN_LOCK_UNLOCKED;/* protects the above */
 
+void __init super_init(void)
+{
+	unnamed_dev_in_use = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!unnamed_dev_in_use)
+		panic("Could not allocate anonymous device map");
+}
+
 int set_anon_super(struct super_block *s, void *data)
 {
 	int dev;
 	spin_lock(&unnamed_dev_lock);
+
 	dev = find_first_zero_bit(unnamed_dev_in_use, Max_anon);
 	if (dev == Max_anon) {
 		spin_unlock(&unnamed_dev_lock);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-15 10:28             ` Ingo Oeser
  2003-10-15 16:16               ` Mike Waychison
@ 2003-10-23 13:37               ` Ian Kent
  2003-10-23 17:00                 ` Mike Waychison
  1 sibling, 1 reply; 23+ messages in thread
From: Ian Kent @ 2003-10-23 13:37 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Mike Waychison, Kernel Mailing List


Please forgive my ignorance Ingo but ...

I suffer from race condition blindness. A terible afflicition when one is 
trying to understand the sublties of the kernel, but I'm trying.

While I am not questioning your suggestion, I have thought about the code 
and fail to see the race you point out. Please help me along.

On Wed, 15 Oct 2003, Ingo Oeser wrote:

> On Wednesday 15 October 2003 01:12, Mike Waychison wrote:
> > The problem still remains in 2.6 that we limit the count to 256.  I've
> > attached a quick patch that I've compiled and tested.  I don't know if
> > there is a better way to handle dynamic assignment of minors (haven't
> > kept up to date in that realm), but if there is, then we should probably
> >   use it instead.
> 
> 
> In your patch you allocate inside the spinlock.

Do you mean we don't want to sleep under the spin lock?
Would a GFP_ATOMIC make a difference to the analysis?

> 
> I would suggest to do sth. like the following:
> 
> void *local;
> if (!unamed_dev_inuse) {
>     local = get_zeroed_page(GFP_KERNEL);
> 
>     if (!local) 
>         return -ENOMEM;
> }
> 
> spinlock(&unamed_dev_lock);
> mb();
> if (!unamed_dev_inuse) {
>     unamed_dev_inuse = local;
> 
>     /* Used globally, don't free now */
>     local = NULL;
> }
> 
> /* 
>   Do the lookup and alloc
>  */
> 
> spinunlock(&unamed_dev_lock);
> 
> /* Free page, because of race on allocation. */
> if (local) 
>     free_page(local);
> 
> 
> Which will swap the pointers atomically and still alloc outside the
> non-sleeping locking.

As I said please give me a hint about your thinking here.
And the use of a memory barrier as well ... umm?

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-23 13:37               ` Ian Kent
@ 2003-10-23 17:00                 ` Mike Waychison
  2003-10-23 17:09                   ` Tim Hockin
  2003-10-24  0:47                   ` Ian Kent
  0 siblings, 2 replies; 23+ messages in thread
From: Mike Waychison @ 2003-10-23 17:00 UTC (permalink / raw)
  To: Ian Kent; +Cc: Ingo Oeser, Kernel Mailing List

Ian Kent wrote:

>On Wed, 15 Oct 2003, Ingo Oeser wrote:
>  
>
>>In your patch you allocate inside the spinlock.
>>    
>>
>
>Do you mean we don't want to sleep under the spin lock?
>Would a GFP_ATOMIC make a difference to the analysis?
>  
>
Yes, sleeping within a spinlock is bad practice because it may 
eventually deadlock.  Pretend that the lock is taken, the call to 
kmalloc is made, the mm system doesn't have any immidiately free memory 
and through some flow of execution requires that a some pseudo-block 
device backed filesystem needs to be mounted -> deadlock.  I have no 
idea if this is currently a likely scenario, however not sleeping within 
a lock is 'The Right Thing' and should be avoided at all costs. 

GFP_ATOMIC should be avoided in most circumstances, particularly in 
environments where the code can be refactored to allow for the sleep.  
It is less likely to find free memory atomically and is thus more likely 
to fail.

>>I would suggest to do sth. like the following:
>>
>>void *local;
>>if (!unamed_dev_inuse) {
>>    local = get_zeroed_page(GFP_KERNEL);
>>
>>    if (!local) 
>>        return -ENOMEM;
>>}
>>
>>spinlock(&unamed_dev_lock);
>>mb();
>>if (!unamed_dev_inuse) {
>>    unamed_dev_inuse = local;
>>
>>    /* Used globally, don't free now */
>>    local = NULL;
>>}
>>
>>/* 
>>  Do the lookup and alloc
>> */
>>
>>spinunlock(&unamed_dev_lock);
>>
>>/* Free page, because of race on allocation. */
>>if (local) 
>>    free_page(local);
>>
>>
>>Which will swap the pointers atomically and still alloc outside the
>>non-sleeping locking.
>>    
>>
>
>As I said please give me a hint about your thinking here.
>And the use of a memory barrier as well ... umm?
>
>  
>

Ingo's patch simply moved the allocation outside the spinlock..  See my 
later patch about moving the allocation to and __init section, which is 
probably the cleaner thing to do and doesn't require grabbing the page 
and using it conditionally.

As for the mb(), I *thought* that a spinlock implied a memory barrier, 
however I think he put it there because it solves the age-old badness of 
double-checked locking (search google for good explanations of the badness).

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-23 17:00                 ` Mike Waychison
@ 2003-10-23 17:09                   ` Tim Hockin
  2003-10-24  0:47                   ` Ian Kent
  1 sibling, 0 replies; 23+ messages in thread
From: Tim Hockin @ 2003-10-23 17:09 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Ian Kent, Ingo Oeser, Kernel Mailing List

On Thu, Oct 23, 2003 at 01:00:57PM -0400, Mike Waychison wrote:
> >Would a GFP_ATOMIC make a difference to the analysis?
 
> Yes, sleeping within a spinlock is bad practice because it may 
> eventually deadlock.  Pretend that the lock is taken, the call to 
> kmalloc is made, the mm system doesn't have any immidiately free memory 
> and through some flow of execution requires that a some pseudo-block 
> device backed filesystem needs to be mounted -> deadlock.  I have no 
> idea if this is currently a likely scenario, however not sleeping within 
> a lock is 'The Right Thing' and should be avoided at all costs. 

it's worse than that.  It's forbidden.  It's a VERY likely deadlock scenario
in the general sense, even if this particular case is not.  If you need to
lock something and you need to sleep holding that lock, use a semaphore.

-- 
Notice that as computers are becoming easier and easier to use,
suddenly there's a big market for "Dummies" books.  Cause and effect,
or merely an ironic juxtaposition of unrelated facts?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-23 17:00                 ` Mike Waychison
  2003-10-23 17:09                   ` Tim Hockin
@ 2003-10-24  0:47                   ` Ian Kent
  2003-10-24  1:42                     ` Tim Hockin
  1 sibling, 1 reply; 23+ messages in thread
From: Ian Kent @ 2003-10-24  0:47 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Ingo Oeser, Kernel Mailing List


Thanks for the description.

I thought it was bad to call a function that could block while
holding a lock. At least I was close to right this time.

I wasn't aware of the badness I'll see what I can find.

On Thu, 23 Oct 2003, Mike Waychison wrote:

>
> Ingo's patch simply moved the allocation outside the spinlock..  See my
> later patch about moving the allocation to and __init section, which is
> probably the cleaner thing to do and doesn't require grabbing the page
> and using it conditionally.
>

Missed that when I returned to it. Found it now.

That is clearly a better way to do it.

I there any chance this would be accepted into 2.6.0?

I think it's quite important, hopefully others do as well.


-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [NFS] RE: [autofs] multiple servers per automount
  2003-10-24  0:47                   ` Ian Kent
@ 2003-10-24  1:42                     ` Tim Hockin
  0 siblings, 0 replies; 23+ messages in thread
From: Tim Hockin @ 2003-10-24  1:42 UTC (permalink / raw)
  To: Ian Kent; +Cc: Mike Waychison, Ingo Oeser, Kernel Mailing List, torvalds

Recap: Mike Waychison posted a simple patch to make Max_anon bit array
(NFS mounts etc.) use exactly one page.

On Fri, Oct 24, 2003 at 08:47:57AM +0800, Ian Kent wrote:
> I there any chance this would be accepted into 2.6.0?
> 
> I think it's quite important, hopefully others do as well.


Wouldn't it be saner to have a sysctl to adjust that?  From 1 page to
2^20/(PAGE_SIZE * CHAR_BIT) pages?  Perhaps just in page-sized increments?

This would be a simple patch... But maybe it's not 'stabilization' for
2.6.0.

Maybe the simple version in 2.6.0 and the right version in 2.6.1?

Linus?


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-10-24  1:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-10 15:16 multiple servers per automount Ogden, Aaron A.
2003-10-13  3:23 ` [NFS] " Ian Kent
2003-10-14  7:05   ` Joseph V Moss
2003-10-14 13:37     ` RE: [autofs] " Ian Kent
2003-10-14 13:37       ` Ian Kent
2003-10-14 15:52       ` [NFS] " Mike Waychison
2003-10-14 15:52         ` [NFS] RE: [autofs] " Mike Waychison
2003-10-14 20:44         ` H. Peter Anvin
2003-10-14 23:12           ` Mike Waychison
2003-10-15 10:28             ` Ingo Oeser
2003-10-15 16:16               ` Mike Waychison
2003-10-23 13:37               ` Ian Kent
2003-10-23 17:00                 ` Mike Waychison
2003-10-23 17:09                   ` Tim Hockin
2003-10-24  0:47                   ` Ian Kent
2003-10-24  1:42                     ` Tim Hockin
2003-10-15  7:22         ` Ian Kent
2003-10-15  7:22           ` [NFS] " Ian Kent
2003-10-15  7:22           ` Ian Kent
  -- strict thread matches above, loose matches on Subject: below --
2003-10-10 17:02 [NFS] " Eric Werme USG
2003-10-10 15:43 Ogden, Aaron A.
2003-10-10 15:54 ` Mike Waychison
2003-10-10 15:10 Re: [autofs] " Lever, Charles
2003-10-13  3:05 ` [NFS] " Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.