more SMP issues

All of lore.kernel.org
 help / color / mirror / Atom feed

* more SMP issues
@ 2002-03-26 19:22 Andrew Theurer
  2002-03-27  2:59 ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Theurer @ 2002-03-26 19:22 UTC (permalink / raw)
  To: nfs

Hello,

I am still seeing some scaling problems with my SMP nfs server, so I did some 
more tests.  I has suspected that there were not enough nfsd threads actually 
doing work, since my run queue was so short (2 or less) and CPU utilization 
was below 35% (this is a 4-way SMP).  So, before the tests, I added nfsd_busy 
to /proc/net/rpc/nfsd, so I could monitor exactly how many nfsd threads were 
getting to svc_process() at any point in time.  I would then monitor this 
during my tests, so I know how many nfsd threads are actually getting to 
svc_process().  

My first test is a NFS client read test.  48 clients each read a 200MB file 
from the same server.  The elapsed time is recorded for all the clients to 
finish, and then the throughput is calculated.  Results from this test 
 raised concerns because I did not see a significant improvement from 
uniprocessor to 4-way.  I had been running this test for udp and this would 
be the result:

Test    SMP CPU proto vers rwsize nfsdcount nfsdbusy secs MB/sec NFSops/sec
nfsread 4P   34  udp   3    8k     128       1        109   88     11009

nfsd_busy would never exceed 1 in this test.  I have tried with various qty 
of nfsd threads, and it always has just one busy thread with one exception:  
If I set the number of nfsd threads to 2, nfsd_busy will stay at 2 75% of the 
time during the test, CPU util will be about 55%, and I'll get maybe 15-20% 
better throughput.  If the nfsd thread count is set to anything else, 
nfsd_busy will not exceed 1.  

I decided to try tcp protocol just to get a comparison, and surprisingly it 
_could_ have more than 1 for nfsd_busy:

Test    SMP CPU proto vers rwsize nfsdcount nfsdbusy secs MB/sec NFSops/sec
nfsread 4P  100 tcp    3    8k     128       12       110   87     10903

Nfsd_busy reached a maximum of 12, and probably averaged around 8 during 
the test.  The performance was not better, but I still don't understand why 
there can be more than one nfsd thread busy with tcp, and not with udp??  I 
know that having nfsd_busy at 2 with udp improves performance.  If we can get 
udp to consistently use more than one thread at one time, I think we can 
boost performance quite a bit. 

Now, I was concerned that I may have some sort of network throughput 
limitation here, since I was close to 100MB/sec, and in my experience, even 
with multiple Gbps adapters (I have 4), the mem copies involved here tend to 
saturate the poor 100 MHz memory bus this server has.  I was not sure if this 
was causing a problem here, so I setup an other test.  This time all 48 
clients do a "ls -lR" on a 2.5.7 kernel tree on the server.  This generates a 
very high number of nfs requests with relatively low network throughput 
compared the the read test.  Total network throughput was under 4 MB/sec 
(send and recv).  Here are the results:

Test    SMP CPU proto vers rwsize nfsdcount nfsdbusy secs MB/sec NFSops/sec
nfsls   4P  100  tcp   3    8k     128       11       127    -    30538
nfsls   4P  34   udp   3    8k     128        1       112    -    34750

Again, with udp I cannot get nfsd_busy to exceed 1.  TCP has the same 
behavior as the last test.  My first thought was that maybe the way the bits  
are set and cleared for the socket (SK_BUSY, SK_DATA, SK_CONN, etc) could be 
causing a problem here, but I cannot confirm that.  It appears they behave 
the same way for tcp and udp, so that is probably not it.  So, I am now stuck 
here.  If there is anyone interested in helping me investigate this (Neil?), 
please let me know. Thanks for your help.

-Andrew

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: more SMP issues
  2002-03-26 19:22 more SMP issues Andrew Theurer
@ 2002-03-27  2:59 ` Neil Brown
  2002-03-27 16:37   ` Andrew Theurer
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2002-03-27  2:59 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: nfs

On Tuesday March 26, habanero@us.ibm.com wrote:
> Again, with udp I cannot get nfsd_busy to exceed 1.  TCP has the same 
> behavior as the last test.  My first thought was that maybe the way the bits  
> are set and cleared for the socket (SK_BUSY, SK_DATA, SK_CONN, etc) could be 
> causing a problem here, but I cannot confirm that.  It appears they behave 
> the same way for tcp and udp, so that is probably not it.  So, I am now stuck 
> here.  If there is anyone interested in helping me investigate this (Neil?), 
> please let me know. Thanks for your help.
> 

Its very odd.

I'm doing some testing on 2.4.18 plus my patches, using UDP, and I'm
definately getting more than 12 threads being used at times.  But this
is a mixed load, not a read-only load.

Could you

 echo 256 > /proc/sys/sunrpc/rpc_debug ; sleep 5 ; echo 0 >  /proc/sys/sunrpc/rpc_debug 

while the udp reading is happening and see if lines mentioning
   "no space"
appear in the log.  That is about the only thing that I can think of.

The fact that you seem to get more threads if you only ask for 2 is
really odd....

NeilBrown


_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: more SMP issues
  2002-03-27  2:59 ` Neil Brown
@ 2002-03-27 16:37   ` Andrew Theurer
  2002-03-28  4:05     ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Theurer @ 2002-03-27 16:37 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

> Its very odd.
>
> I'm doing some testing on 2.4.18 plus my patches, using UDP, and I'm
> definately getting more than 12 threads being used at times.  But this
> is a mixed load, not a read-only load.
>
> Could you
>
>  echo 256 > /proc/sys/sunrpc/rpc_debug ; sleep 5 ; echo 0 > 
> /proc/sys/sunrpc/rpc_debug
>
> while the udp reading is happening and see if lines mentioning
>    "no space"
> appear in the log.  That is about the only thing that I can think of.

Here is udp:

Mar 27 08:15:52 sambaperf kernel: svc: socket f6ec17c0 served by daemon 
f4cd8e00
Mar 27 08:15:52 sambaperf kernel: svc: got len=132
Mar 27 08:15:52 sambaperf kernel: svc: socket f6ec17c0 busy, not enqueued
Mar 27 08:15:52 sambaperf kernel: svc: socket f50fcda0(inet f6ec17c0), 
write_space busy=1
Mar 27 08:15:52 sambaperf kernel: svc: socket f6ec17c0 busy, not enqueued
Mar 27 08:15:52 sambaperf kernel: svc: socket f50fcda0(inet f6ec17c0), 
count=140, busy=1
Mar 27 08:15:52 sambaperf kernel: svc: socket f6ec17c0 busy, not enqueued
Mar 27 08:15:52 sambaperf kernel: svc: socket f6ec17c0 busy, not enqueued
Mar 27 08:15:52 sambaperf kernel: svc: service f4cf0a00, releasing skb 
f4909340
Mar 27 08:15:52 sambaperf kernel: svc: socket f50fcda0 sendto([f4cd4000 
8320... ], 1, 8320) = 8320

I also took one for tcp just to compare:

Mar 27 08:20:28 sambaperf kernel: svc: socket f4988cc0 busy, not enqueued
Mar 27 08:20:28 sambaperf kernel: svc: socket f4a0a360 busy, not enqueued
Mar 27 08:20:28 sambaperf kernel: svc: socket f714c360 sendto([f4e04000 
8324... ], 1, 8324) = 8324
Mar 27 08:20:28 sambaperf kernel: svc: socket f4a0a360 busy, not enqueued
Mar 27 08:20:28 sambaperf kernel: svc: server f436e600 waiting for data (to = 
30000)
Mar 27 08:20:28 sambaperf kernel: svc: socket f43159a0 busy, not enqueued
Mar 27 08:20:28 sambaperf kernel: svc: server f439b800 waiting for data (to = 
30000)
Mar 27 08:20:28 sambaperf kernel: svc: socket f4325040 TCP data ready (svsk 
f446e120)
Mar 27 08:20:28 sambaperf kernel: svc: socket f4325040 busy, not enqueued
Mar 27 08:20:29 sambaperf kernel: svc: server f502d400, socket f714c360, 
inuse=1

I also setup another debug, where I prinkt'd at all the SK bit set and 
clears, and a couple other areas.  In this test, I only sent one read request 
from one client:

pid: 0 svc: socket f5b97e40(inet f4f29960), count=140, SK_BUSY=0<6> and 
setting SK_DATA in svc_udp_data_ready
pid: 0 testing SK_BUSY in svc_sock_enqueue
pid: 0 setting SK_BUSY in svc_sock_enqueue
pid: 1390 clearing SK_DATA in svc_udp_recvfrom
pid: 1390 setting SK_DATA in svc_udp_recvfrom
pid: 1390 clearing SK_BUSY in svc_sock_received
pid: 1390 testing SK_BUSY in svc_sock_enqueue
pid: 1390 setting SK_BUSY in svc_sock_enqueue
pid: 1390 calling svc_process
pid: 1390 testing SK_BUSY in svc_sock_enqueue
pid: 1390 testing SK_BUSY in svc_sock_enqueue
pid: 1387 clearing SK_DATA in svc_udp_recvfrom
pid: 1387 clearing SK_BUSY in svc_sock_received
svc: socket f5b97e40(inet f4f29960), write_space SK_BUSY=0 in svc_write_space
svc: socket f5b97e40(inet f4f29960), write_space SK_BUSY=0 in svc_write_space
svc: socket f5b97e40(inet f4f29960), write_space SK_BUSY=0 in svc_write_space

My only concern was that SK_BUSY was set, then svc_process() is called.  I 
thought SK_BUSY should not be set during svc_process(). 

> The fact that you seem to get more threads if you only ask for 2 is
> really odd....

Yes, that really shocked me.  I tried it several times.  I tried 1 to 8 
threads, and all but 2 had nfsd_busy of 1, 2 threads had nsfd_busy of 2 most 
of the time, and throughput was up.  

I am going to setup SpecSFS for a comparison, but IMO I like this test 
because it runs so quickly and exposes a potential problem.

Thanks for your help,

-Andrew

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: more SMP issues
  2002-03-27 16:37   ` Andrew Theurer
@ 2002-03-28  4:05     ` Neil Brown
  2002-03-28  4:28       ` Neil Brown
  2002-03-28  4:58       ` Andrew Theurer
  0 siblings, 2 replies; 6+ messages in thread
From: Neil Brown @ 2002-03-28  4:05 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: nfs

On Wednesday March 27, habanero@us.ibm.com wrote:
> 
> Here is udp:
> 
> Mar 27 08:15:52 sambaperf kernel: svc: socket f6ec17c0 served by daemon 
> f4cd8e00
> Mar 27 08:15:52 sambaperf kernel: svc: got len=132
....
> 
> I also took one for tcp just to compare:
> 
> Mar 27 08:20:28 sambaperf kernel: svc: socket f4988cc0 busy, not enqueued
> Mar 27 08:20:28 sambaperf kernel: svc: socket f4a0a360 busy, not enqueued
> Mar 27 08:20:28 sambaperf kernel: svc: socket f714c360 sendto([f4e04000 
> 8324... ], 1, 8324) = 8324
....

Nothing odd here.... thanks.

> 
> I also setup another debug, where I prinkt'd at all the SK bit set and 
> clears, and a couple other areas.  In this test, I only sent one read request 
> from one client:
> 
> pid: 0 svc: socket f5b97e40(inet f4f29960), count=140, SK_BUSY=0<6> and 
> setting SK_DATA in svc_udp_data_ready
> pid: 0 testing SK_BUSY in svc_sock_enqueue
> pid: 0 setting SK_BUSY in svc_sock_enqueue
> pid: 1390 clearing SK_DATA in svc_udp_recvfrom
> pid: 1390 setting SK_DATA in svc_udp_recvfrom
> pid: 1390 clearing SK_BUSY in svc_sock_received
> pid: 1390 testing SK_BUSY in svc_sock_enqueue
> pid: 1390 setting SK_BUSY in svc_sock_enqueue
> pid: 1390 calling svc_process
> pid: 1390 testing SK_BUSY in svc_sock_enqueue
> pid: 1390 testing SK_BUSY in svc_sock_enqueue
> pid: 1387 clearing SK_DATA in svc_udp_recvfrom
> pid: 1387 clearing SK_BUSY in svc_sock_received
> svc: socket f5b97e40(inet f4f29960), write_space SK_BUSY=0 in svc_write_space
> svc: socket f5b97e40(inet f4f29960), write_space SK_BUSY=0 in svc_write_space
> svc: socket f5b97e40(inet f4f29960), write_space SK_BUSY=0 in svc_write_space
> 
> My only concern was that SK_BUSY was set, then svc_process() is called.  I 
> thought SK_BUSY should not be set during svc_process(). 

No, this is normal.
After taking a request off the socket, the process calls
svc_sock_enqueue incase there is another packet.  The socket gets
attached to some process and marked busy, until that process wakes up
and deals with the next packet.... but that gives me an idea....

> 
> > The fact that you seem to get more threads if you only ask for 2 is
> > really odd....
> 
> Yes, that really shocked me.  I tried it several times.  I tried 1 to 8 
> threads, and all but 2 had nfsd_busy of 1, 2 threads had nsfd_busy of 2 most 
> of the time, and throughput was up.  

Ok, here is a theory.
All your data is in cache, right?
So when NFSD gets a read request, it hurtles along processing it and
never has to block all the way until it is completed.  As this is done
in less than the scheduling interval, it never gets interrupted and no
other nfsd thread gets to run on that CPU until this one has finished
it's request.
This would give you a situation in which you never have more
concurrent nfsd's than you have processors.

But you are getting even worse than that:  You have only one active
thread on a quad processor machine.  I blithely blame this on the
scheduler.
Maybe all the nfsd threads are initially attached to the same
processor, and for some reason they never get migrated to another 
processor.   Is this a question for Ingo Molnar??

Maybe when you only have 2 nfsds, the dynamics are a bit different and
one thread gets migrated and you then get dual processor performance.

Possibly we could put some scheduler call in just after the wake_up()
in svcsock.c which says to go bother some other CPU???

Does that make any sense, or sound plausable?

> 
> I am going to setup SpecSFS for a comparison, but IMO I like this test 
> because it runs so quickly and exposes a potential problem.
> 

That's what I was running.... but I had a bug is another
under-development patch and it locked up chasing a circular list at
about 2000 Ops/Sec (That is an unpublished number. I didn't publish
it. You cannot say that I did).

NeilBrown

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: more SMP issues
  2002-03-28  4:05     ` Neil Brown
@ 2002-03-28  4:28       ` Neil Brown
  2002-03-28  4:58       ` Andrew Theurer
  1 sibling, 0 replies; 6+ messages in thread
From: Neil Brown @ 2002-03-28  4:28 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: nfs

On Thursday March 28, neilb@cse.unsw.edu.au wrote:
> 
> Maybe when you only have 2 nfsds, the dynamics are a bit different and
> one thread gets migrated and you then get dual processor performance.
> 
> Possibly we could put some scheduler call in just after the wake_up()
> in svcsock.c which says to go bother some other CPU???

Something that might be worth a try is is to change the "list_add"
int svc_serv_enqueue to "list_add_tail".

With list_add, it just uses as many threads as it needs and leaves the
others completely ideal (and so not poluting cache).
With list_add_tail, it will cycle through all the threads, and
that might cause enough turbulence to get all the processors working.

It's not the *right* solution, but it is probably worth a try.

NeilBrown

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: more SMP issues
  2002-03-28  4:05     ` Neil Brown
  2002-03-28  4:28       ` Neil Brown
@ 2002-03-28  4:58       ` Andrew Theurer
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Theurer @ 2002-03-28  4:58 UTC (permalink / raw)
  Cc: nfs

> Ok, here is a theory.
> All your data is in cache, right?

Yes

> So when NFSD gets a read request, it hurtles along processing it and
> never has to block all the way until it is completed.  As this is done
> in less than the scheduling interval, it never gets interrupted and no
> other nfsd thread gets to run on that CPU until this one has finished
> it's request.
> This would give you a situation in which you never have more
> concurrent nfsd's than you have processors.

Yes, that seems right.  That actually would not be so bad.  At least I could
achive 100% cpu.

> But you are getting even worse than that:  You have only one active
> thread on a quad processor machine.  I blithely blame this on the
> scheduler.
> Maybe all the nfsd threads are initially attached to the same
> processor, and for some reason they never get migrated to another
> processor.   Is this a question for Ingo Molnar??

I think you are on to something here.  Actually, when monitoring top, I
always have exactly one CPU at 100%, while the others are at ~10%
(interrupts?).  Eventually the 100% moves from one processor to another.
Sounds like the nfsd threads all wake up and stay on the same processor.  I
need to look at how O(1) handles wake_up, and whether it puts all tasks on
the same runqueue, or distributes them amongst <num_cpu> runqueues.

> Maybe when you only have 2 nfsds, the dynamics are a bit different and
> one thread gets migrated and you then get dual processor performance.
>
> Possibly we could put some scheduler call in just after the wake_up()
> in svcsock.c which says to go bother some other CPU???
>
> Does that make any sense, or sound plausable?

Yes.  I could also retest with 4 nfsd threads, each bound to a unique CPU.
I have not looked at Ingo's latest cpu affinity work on O(1), but I'm sure
it can be done.  At some point I wanted to do some experimetation with
process and IRQ affinity anyway.  With samba/netbench I could get 25% better
performance.  In either case, I'll try this and list_add_tail in
svc_serv_enqueue.

> > I am going to setup SpecSFS for a comparison, but IMO I like this test
> > because it runs so quickly and exposes a potential problem.
> >
>
> That's what I was running.... but I had a bug is another
> under-development patch and it locked up chasing a circular list at
> about 2000 Ops/Sec (That is an unpublished number. I didn't publish
> it. You cannot say that I did).

I just got this started today, and I'm having a kernel panic on 1000 Ops in
2.5.7, and I'm not even out of the init phase.  Soon I'll have some more
data.  So far I did see that nfsd_busy get to 12!  However my runqueue
length was still 2 or less.  This obviuosly invloved disk IO (disk was
constantly going).  I need a little more time get get SFS running well to
figure out whats gong on.

Thanks,

-Andrew

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-03-28  4:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-03-26 19:22 more SMP issues Andrew Theurer
2002-03-27  2:59 ` Neil Brown
2002-03-27 16:37   ` Andrew Theurer
2002-03-28  4:05     ` Neil Brown
2002-03-28  4:28       ` Neil Brown
2002-03-28  4:58       ` Andrew Theurer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.