rsockets and fork

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* rsockets and fork
@ 2012-08-13 23:12 Sridhar Samudrala
       [not found] ` <1344899557.2101.29.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-13 23:12 UTC (permalink / raw)
  To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Sean,

I could not get fork enabled netperf to work with rsockets in the latest
librdmacm git repository.
After some debugging, i found that the child netserver process is blocked at 
sem_wait() call in fork_passive().
It is not clear to me how this call is supposed to unblock as sem_post()
is done later in the same function.

If i comment out sem_wait() and sem_post() in this routine, i got a single
instance of netperf working with a forked netserver.

However, if i start another netperf instance while the other session is
still going on, it seems to hang and return with a very low throughput.
It looks as if the first session is starving all the other sessions.
The right behavior would be to have the available bw split across the 
parallel instances.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found] ` <1344899557.2101.29.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-13 23:39   ` Hefty, Sean
       [not found]     ` <1828884A29C6694DAF28B7E6B8A8237346A7DB36-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-13 23:39 UTC (permalink / raw)
  To: Sridhar Samudrala,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> I could not get fork enabled netperf to work with rsockets in the latest
> librdmacm git repository.
> After some debugging, i found that the child netserver process is blocked at
> sem_wait() call in fork_passive().
> It is not clear to me how this call is supposed to unblock as sem_post()
> is done later in the same function.

sem_open() should create the semaphore with an initial value of 1.  The sem_wait/sem_post calls serialize listening on the corresponding rsocket.  The name semaphore should remain until the system is rebooted.
 
> However, if i start another netperf instance while the other session is
> still going on, it seems to hang and return with a very low throughput.
> It looks as if the first session is starving all the other sessions.
> The right behavior would be to have the available bw split across the
> parallel instances.

Can you verify that the second session is using rsockets and not falling back to a socket connection?

You can try adjusting the polling time.  This can usually be done by writing a value (time in microseconds) to the following file:

/etc/rdma/rsocket/polling_time

By default, the value is 10.

- Sean

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rsockets and fork
       [not found]     ` <1828884A29C6694DAF28B7E6B8A8237346A7DB36-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-14 17:51       ` Sridhar Samudrala
       [not found]         ` <502A9039.4090505-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-14 17:51 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 8/13/2012 4:39 PM, Hefty, Sean wrote:
>> I could not get fork enabled netperf to work with rsockets in the latest
>> librdmacm git repository.
>> After some debugging, i found that the child netserver process is blocked at
>> sem_wait() call in fork_passive().
>> It is not clear to me how this call is supposed to unblock as sem_post()
>> is done later in the same function.
> sem_open() should create the semaphore with an initial value of 1.  The sem_wait/sem_post calls serialize listening on the corresponding rsocket.  The name semaphore should remain until the system is rebooted.
Looks like a stale semaphore issue. I removed /dev/shm/sem.rsocket_fork 
and restarted netserver and it is working now.
>   
>> However, if i start another netperf instance while the other session is
>> still going on, it seems to hang and return with a very low throughput.
>> It looks as if the first session is starving all the other sessions.
>> The right behavior would be to have the available bw split across the
>> parallel instances.
> Can you verify that the second session is using rsockets and not falling back to a socket connection?
Yes. it is also using rsockets.
The second session always hangs after sending a fixed number of bytes 
(38469632).
rsend() blocks waiting for the CQ event.

Thanks
Sridhar
>
> You can try adjusting the polling time.  This can usually be done by writing a value (time in microseconds) to the following file:
>
> /etc/rdma/rsocket/polling_time
>
> By default, the value is 10.
>
> - Sean
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]         ` <502A9039.4090505-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-08-14 17:56           ` Hefty, Sean
       [not found]             ` <1828884A29C6694DAF28B7E6B8A8237346A89125-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-14 17:56 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> Yes. it is also using rsockets.
> The second session always hangs after sending a fixed number of bytes
> (38469632).
> rsend() blocks waiting for the CQ event.

Can you send me the parameters that you use for testing?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]             ` <1828884A29C6694DAF28B7E6B8A8237346A89125-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-14 18:18               ` Sridhar Samudrala
       [not found]                 ` <1344968289.2101.36.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-14 18:18 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Tue, 2012-08-14 at 17:56 +0000, Hefty, Sean wrote:
> > Yes. it is also using rsockets.
> > The second session always hangs after sending a fixed number of bytes
> > (38469632).
> > rsend() blocks waiting for the CQ event.
> 
> Can you send me the parameters that you use for testing?

This test is using Mellanox 10Gb RoCEE with MTU set to 9000

Server is started using
  # ldr netserver -D

2 clients are started in 2 windows as follows. 

# ldr netperf -v2 -c -C -H 192.168.0.22 -l10
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      9679.80   2.58     2.59     0.349   0.351  

Alignment      Offset         Bytes    Bytes       Sends   Bytes    Recvs
Local  Remote  Local  Remote  Xfered   Per                 Per
Send   Recv    Send   Recv             Send (avg)          Recv (avg)
    8       8      0       0 12101877760  65536.00    184660   21967.51 550899

Maximum
Segment
Size (bytes)
  4096

# ldr netperf -v2 -c -C -H 192.168.0.22 -l10
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00        30.69   2.27     2.78     96.892  118.616 

Alignment      Offset         Bytes    Bytes       Sends   Bytes    Recvs
Local  Remote  Local  Remote  Xfered   Per                 Per
Send   Recv    Send   Recv             Send (avg)          Recv (avg)
    8       8      0       0 38469632  65536.00       587   21360.15   1801

Maximum
Segment
Size (bytes)
  4096

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                 ` <1344968289.2101.36.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-16 23:40                   ` Hefty, Sean
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237346A89AEB-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-16 23:40 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2905 bytes --]

> This test is using Mellanox 10Gb RoCEE with MTU set to 9000
> 
> Server is started using
>   # ldr netserver -D
> 
> 2 clients are started in 2 windows as follows.
> 
> # ldr netperf -v2 -c -C -H 192.168.0.22 -l10
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0
> AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  65536  65536    10.00      9679.80   2.58     2.59     0.349   0.351
> 
> Alignment      Offset         Bytes    Bytes       Sends   Bytes    Recvs
> Local  Remote  Local  Remote  Xfered   Per                 Per
> Send   Recv    Send   Recv             Send (avg)          Recv (avg)
>     8       8      0       0 12101877760  65536.00    184660   21967.51 550899
> 
> Maximum
> Segment
> Size (bytes)
>   4096
> 
> # ldr netperf -v2 -c -C -H 192.168.0.22 -l10
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0
> AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  65536  65536    10.00        30.69   2.27     2.78     96.892  118.616
> 
> Alignment      Offset         Bytes    Bytes       Sends   Bytes    Recvs
> Local  Remote  Local  Remote  Xfered   Per                 Per
> Send   Recv    Send   Recv             Send (avg)          Recv (avg)
>     8       8      0       0 38469632  65536.00       587   21360.15   1801
> 
> Maximum
> Segment
> Size (bytes)
>   4096

I don't have RoCE installed.  With IB, I haven't been able to see this problem after dozens of attempts.  The performance isn't divided equally, but I usually see between 3-10 Gbps out of each connection.  I'm running with some additional patches, but I don't see where those would affect any hang.

During development, I've seen issues where specific transfer patterns just happen to fall on boundary conditions that result in hangs or slowness.  Hopefully that's not the case because those are a pain to identify.

Is this something that you would be able to use a debugger on?  Dumping the contents of the struct rsocket * would be useful in troubleshooting if we're waiting on credits, buffer space, the app, or something else.

- Sean
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±ÙšŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237346A89AEB-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-17 21:40                       ` Sridhar Samudrala
       [not found]                         ` <1345239625.1128.20.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-17 21:40 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Thu, 2012-08-16 at 23:40 +0000, Hefty, Sean wrote:

> I don't have RoCE installed.  With IB, I haven't been able to see this problem after dozens of attempts.  The performance isn't divided equally, but I usually see between 3-10 Gbps out of each connection.  I'm running with some additional patches, but I don't see where those would affect any hang.
> 
> During development, I've seen issues where specific transfer patterns just happen to fall on boundary conditions that result in hangs or slowness.  Hopefully that's not the case because those are a pain to identify.
> 
> Is this something that you would be able to use a debugger on?  Dumping the contents of the struct rsocket * would be useful in troubleshooting if we're waiting on credits, buffer space, the app, or something else.
> 
Here are the dumps of struct rsocket of a hung netperf connection.
netserver: hang in recv()

(gdb) bt
#0  0x0000003286ed83f0 in __read_nocancel () from /lib64/libc.so.6
#1  0x0000003b7220a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
#2  0x00007fee00642197 in rs_get_cq_event (rs=0x18139c0) at src/rsocket.c:941
#3  0x00007fee00643567 in rs_process_cq (rs=0x18139c0, nonblock=0, 
    test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:989
#4  0x00007fee006438bd in rs_get_comp (rs=0x18139c0, nonblock=<value optimized out>, 
    test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:1019
#5  0x00007fee00643b51 in rrecv (socket=<value optimized out>, buf=0x17fdd10, len=87380, 
    flags=<value optimized out>) at src/rsocket.c:1136
#6  0x000000000042d6e3 in recv_data ()
#7  0x0000000000430da9 in recv_omni ()
#8  0x0000000000403628 in process_requests ()
#9  0x00000000004037b0 in spawn_child ()
#10 0x00000000004038f0 in accept_connection ()
#11 0x0000000000403a46 in accept_connections ()
#12 0x000000000040407a in main ()
(gdb) p *(struct rsocket *)0x18139c0
$1 = {cm_id = 0x1813b80, slock = {sem = {__size = "\000\000\000\000\200", '\000' <repeats 26 times>, 
      __align = 549755813888}, cnt = 0}, rlock = {sem = {
      __size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align = 549755813888}, cnt = 1}, 
  cq_lock = {sem = {__size = "\000\000\000\000\200", '\000' <repeats 26 times>, 
      __align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
      __size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align = 549755813888}, cnt = 1}, 
  opts = 0, fd_flags = 2, so_opts = 4, tcp_opts = 2, ipv6_opts = 0, state = 1792, cq_armed = 1, 
  retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020, sbuf_bytes_avail = 131072, 
  sseq_no = 0, sseq_comp = 1024, sq_size = 1024, sq_inline = 64, rq_size = 1024, rseq_no = 3515, 
  rseq_comp = 4026, rbuf_bytes_avail = 63488, rbuf_free_offset = 65536, rbuf_offset = 129024, 
  rmsg_head = 440, rmsg_tail = 440, rmsg = 0x1813dd0, remote_sge = 1, remote_sgl = {addr = 6784856, 
    key = 2550145917, length = 2}, target_mr = 0x1815e60, target_sge = 0, target_sgl = {{
      addr = 140617248624656, key = 2550146173, length = 65536}, {addr = 140617248690192, 
      key = 2550146173, length = 65536}}, rbuf_size = 131072, rmr = 0x17dd680, 
  rbuf = 0x7fee00347010 "netperf", sbuf_size = 131072, smr = 0x17dffd0, ssgl = {{
      addr = 140660182515728, length = 0, lkey = 3758105205}, {addr = 140660182515728, length = 0, 
      lkey = 3758105205}}, sbuf = 0x7fee00368010 ""}

netperf: hang in send()
(gdb) bt
#0  0x0000003120ad83f0 in __read_nocancel () from /lib64/libc.so.6
#1  0x0000003da700a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
#2  0x00007fe40157f197 in rs_get_cq_event (rs=0x678610) at src/rsocket.c:941
#3  0x00007fe401580567 in rs_process_cq (rs=0x678610, nonblock=0, 
    test=0x7fe40157e430 <rs_conn_can_send>) at src/rsocket.c:989
#4  0x00007fe4015808bd in rs_get_comp (rs=0x678610, 
    nonblock=<value optimized out>, test=0x7fe40157e430 <rs_conn_can_send>)
    at src/rsocket.c:1019
#5  0x00007fe401581b92 in rsend (socket=<value optimized out>, 
    buf=<value optimized out>, len=65536, flags=<value optimized out>)
    at src/rsocket.c:1244
#6  0x000000000042ba6d in send_data ()
#7  0x000000000042d158 in send_omni_inner ()
#8  0x000000000042fbf1 in send_tcp_stream ()
#9  0x000000000040239d in main ()
(gdb) p *(struct rsocket *)0x678610
$1 = {cm_id = 0x6a2af0, slock = {sem = {
      __size = "\000\000\000\000\200", '\000' <repeats 26 times>, 
      __align = 549755813888}, cnt = 1}, rlock = {sem = {
      __size = "\000\000\000\000\200", '\000' <repeats 26 times>, 
      __align = 549755813888}, cnt = 0}, cq_lock = {sem = {
      __size = "\000\000\000\000\200", '\000' <repeats 26 times>, 
      __align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
      __size = "\000\000\000\000\200", '\000' <repeats 26 times>, 
      __align = 549755813888}, cnt = 1}, opts = 0, fd_flags = 2, so_opts = 4, 
  tcp_opts = 0, ipv6_opts = 0, state = 1792, cq_armed = 1, retries = 0, err = 0, 
  index = 12, ctrl_avail = 4, sqe_avail = 1020, sbuf_bytes_avail = 131072, 
  sseq_no = 3522, sseq_comp = 4538, sq_size = 1024, sq_inline = 64, 
  rq_size = 1024, rseq_no = 0, rseq_comp = 512, rbuf_bytes_avail = 0, 
  rbuf_free_offset = 0, rbuf_offset = 0, rmsg_head = 0, rmsg_tail = 0, 
  rmsg = 0x6a2d90, remote_sge = 0, remote_sgl = {addr = 25246472, 
    key = 3758105461, length = 2}, target_mr = 0x6a4e20, target_sge = 1, 
  target_sgl = {{addr = 140660182446096, key = 3758105717, length = 0}, {
      addr = 140660182511632, key = 3758105717, length = 0}}, rbuf_size = 131072, 
  rmr = 0x677c80, rbuf = 0x7fe401275010 "", sbuf_size = 131072, smr = 0x678980, 
  ssgl = {{addr = 140617248825360, length = 2048, lkey = 2550145661}, {
      addr = 140617248759824, length = 0, lkey = 2550145661}}, 
  sbuf = 0x7fe401296010 "netperf"}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                         ` <1345239625.1128.20.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-17 22:17                           ` Hefty, Sean
       [not found]                             ` <1828884A29C6694DAF28B7E6B8A8237346A89F70-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-17 22:17 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

thanks - this helps ... a little

The sender is waiting for the receiver to publish additional receive buffer space.

> (gdb) bt
> #0  0x0000003286ed83f0 in __read_nocancel () from /lib64/libc.so.6
> #1  0x0000003b7220a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
> #2  0x00007fee00642197 in rs_get_cq_event (rs=0x18139c0) at src/rsocket.c:941
> #3  0x00007fee00643567 in rs_process_cq (rs=0x18139c0, nonblock=0,
>     test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:989
> #4  0x00007fee006438bd in rs_get_comp (rs=0x18139c0, nonblock=<value optimized
> out>,
>     test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:1019
> #5  0x00007fee00643b51 in rrecv (socket=<value optimized out>, buf=0x17fdd10,
> len=87380,
>     flags=<value optimized out>) at src/rsocket.c:1136
> #6  0x000000000042d6e3 in recv_data ()
> #7  0x0000000000430da9 in recv_omni ()
> #8  0x0000000000403628 in process_requests ()
> #9  0x00000000004037b0 in spawn_child ()
> #10 0x00000000004038f0 in accept_connection ()
> #11 0x0000000000403a46 in accept_connections ()
> #12 0x000000000040407a in main ()
> (gdb) p *(struct rsocket *)0x18139c0
> $1 = {cm_id = 0x1813b80, slock = {sem = {__size = "\000\000\000\000\200",
> '\000' <repeats 26 times>,
>       __align = 549755813888}, cnt = 0}, rlock = {sem = {
>       __size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align =
> 549755813888}, cnt = 1},
>   cq_lock = {sem = {__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
>       __align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
>       __size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align =
> 549755813888}, cnt = 1},
>   opts = 0, fd_flags = 2, so_opts = 4, tcp_opts = 2, ipv6_opts = 0, state =
> 1792, cq_armed = 1,
>   retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020,

                                      ^^^^^^^^^^^^^^
This looks like part of the problem.  There should be 4 control messages available by default.  The receiver has sent a control message to the sender, but the control message never completed at the receiver.  My guess is that the sender never received it either.  It may be the missing buffer update, which would be an RDMA write into the sender's 'target_sgl' array.

Have you ever been able to reproduce this problem without using fork() support?


> sbuf_bytes_avail = 131072,
>   sseq_no = 0, sseq_comp = 1024, sq_size = 1024, sq_inline = 64, rq_size =
> 1024, rseq_no = 3515,
>   rseq_comp = 4026, rbuf_bytes_avail = 63488, rbuf_free_offset = 65536,
> rbuf_offset = 129024,
>   rmsg_head = 440, rmsg_tail = 440, rmsg = 0x1813dd0, remote_sge = 1,
> remote_sgl = {addr = 6784856,
>     key = 2550145917, length = 2}, target_mr = 0x1815e60, target_sge = 0,
> target_sgl = {{
>       addr = 140617248624656, key = 2550146173, length = 65536}, {addr =
> 140617248690192,
>       key = 2550146173, length = 65536}}, rbuf_size = 131072, rmr = 0x17dd680,
>   rbuf = 0x7fee00347010 "netperf", sbuf_size = 131072, smr = 0x17dffd0, ssgl =
> {{
>       addr = 140660182515728, length = 0, lkey = 3758105205}, {addr =
> 140660182515728, length = 0,
>       lkey = 3758105205}}, sbuf = 0x7fee00368010 ""}
> 
> netperf: hang in send()
> (gdb) bt
> #0  0x0000003120ad83f0 in __read_nocancel () from /lib64/libc.so.6
> #1  0x0000003da700a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
> #2  0x00007fe40157f197 in rs_get_cq_event (rs=0x678610) at src/rsocket.c:941
> #3  0x00007fe401580567 in rs_process_cq (rs=0x678610, nonblock=0,
>     test=0x7fe40157e430 <rs_conn_can_send>) at src/rsocket.c:989
> #4  0x00007fe4015808bd in rs_get_comp (rs=0x678610,
>     nonblock=<value optimized out>, test=0x7fe40157e430 <rs_conn_can_send>)
>     at src/rsocket.c:1019
> #5  0x00007fe401581b92 in rsend (socket=<value optimized out>,
>     buf=<value optimized out>, len=65536, flags=<value optimized out>)
>     at src/rsocket.c:1244
> #6  0x000000000042ba6d in send_data ()
> #7  0x000000000042d158 in send_omni_inner ()
> #8  0x000000000042fbf1 in send_tcp_stream ()
> #9  0x000000000040239d in main ()
> (gdb) p *(struct rsocket *)0x678610
> $1 = {cm_id = 0x6a2af0, slock = {sem = {
>       __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
>       __align = 549755813888}, cnt = 1}, rlock = {sem = {
>       __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
>       __align = 549755813888}, cnt = 0}, cq_lock = {sem = {
>       __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
>       __align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
>       __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
>       __align = 549755813888}, cnt = 1}, opts = 0, fd_flags = 2, so_opts = 4,
>   tcp_opts = 0, ipv6_opts = 0, state = 1792, cq_armed = 1, retries = 0, err =
> 0,
>   index = 12, ctrl_avail = 4, sqe_avail = 1020, sbuf_bytes_avail = 131072,
>   sseq_no = 3522, sseq_comp = 4538, sq_size = 1024, sq_inline = 64,
>   rq_size = 1024, rseq_no = 0, rseq_comp = 512, rbuf_bytes_avail = 0,
>   rbuf_free_offset = 0, rbuf_offset = 0, rmsg_head = 0, rmsg_tail = 0,
>   rmsg = 0x6a2d90, remote_sge = 0, remote_sgl = {addr = 25246472,
>     key = 3758105461, length = 2}, target_mr = 0x6a4e20, target_sge = 1,
>   target_sgl = {{addr = 140660182446096, key = 3758105717, length = 0}, {
>       addr = 140660182511632, key = 3758105717, length = 0}}, rbuf_size =


The sender is stuck waiting for the target_sgl to be updated.  One of the target_sgl entries needs a length > 0.


> 131072,
>   rmr = 0x677c80, rbuf = 0x7fe401275010 "", sbuf_size = 131072, smr = 0x678980,
>   ssgl = {{addr = 140617248825360, length = 2048, lkey = 2550145661}, {
>       addr = 140617248759824, length = 0, lkey = 2550145661}},
>   sbuf = 0x7fe401296010 "netperf"}
> 

I'll see if I can find anything in the code that would account for this.  I don't understand why we're missing a completion or what causes this to occur after sending a specific number of bytes (38469632).  Although that number of bytes is evenly divisible by 64k, which is half the size of the send/receive buffers, which puts us on a boundary condition.  :/

- Sean

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                             ` <1828884A29C6694DAF28B7E6B8A8237346A89F70-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-17 22:36                               ` Hefty, Sean
       [not found]                                 ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-17 22:36 UTC (permalink / raw)
  To: Hefty, Sean, Sridhar Samudrala
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> >   retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020,
> 
>                                       ^^^^^^^^^^^^^^
> This looks like part of the problem.  There should be 4 control messages
> available by default.  The receiver has sent a control message to the sender,
> but the control message never completed at the receiver.  My guess is that the
> sender never received it either.  It may be the missing buffer update, which
> would be an RDMA write into the sender's 'target_sgl' array.
> 
> Have you ever been able to reproduce this problem without using fork() support?

A simple check to add is whether the rs_post_write() call ever fails, but in rs_send_credits() in particular.  Checking for an error from ibv_get_cq_event() in rs_get_cq_event() may also be useful.  In neither case do I expect an error, but we could confirm it.

- Sean

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                                 ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-18  0:34                                   ` Sridhar Samudrala
  2012-08-22 23:35                                   ` Hefty, Sean
  1 sibling, 0 replies; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-18  0:34 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2012-08-17 at 22:36 +0000, Hefty, Sean wrote:
> > >   retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020,
> > 
> >                                       ^^^^^^^^^^^^^^
> > This looks like part of the problem.  There should be 4 control messages
> > available by default.  The receiver has sent a control message to the sender,
> > but the control message never completed at the receiver.  My guess is that the
> > sender never received it either.  It may be the missing buffer update, which
> > would be an RDMA write into the sender's 'target_sgl' array.
> > 
> > Have you ever been able to reproduce this problem without using fork() support?
> 
> A simple check to add is whether the rs_post_write() call ever fails, but in rs_send_credits() in particular.  Checking for an error from ibv_get_cq_event() in rs_get_cq_event() may also be useful.  In neither case do I expect an error, but we could confirm it.

I didn't see any errors with these calls.
So far i have been able to consistently reproduce this hang only when fork support is enabled.
I am seeing this issue randomly even with a simple tcp client/server test only with fork.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                                 ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2012-08-18  0:34                                   ` Sridhar Samudrala
@ 2012-08-22 23:35                                   ` Hefty, Sean
       [not found]                                     ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-22 23:35 UTC (permalink / raw)
  To: Sridhar Samudrala, roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

I'm haven't identified the specific problem with fork support, but I did see this in libmlx4:

mlx4_alloc_context()
{
	...
	context->uar = mmap(NULL, to_mdev(ibdev)->page_size, PROT_WRITE,
			    MAP_SHARED, cmd_fd, 0);
	if (context->uar == MAP_FAILED)
		goto err_free;

	if (resp.bf_reg_size) {
		context->bf_page = mmap(NULL, to_mdev(ibdev)->page_size,
					PROT_WRITE, MAP_SHARED, cmd_fd,
					to_mdev(ibdev)->page_size);
	...
}

I don't know for certain that these mmap() calls cause an issue, but the preload library socket() function calls rsocket(), which loads and initializes libibverbs.  This calls mlx4_alloc_context() before fork() has been called.

I added the following hack to the preload socket() call, which skips calling rsocket().

	if ((domain == PF_INET || domain == PF_INET6) &&
	    (type == SOCK_STREAM) && (!protocol || protocol == IPPROTO_TCP) && fork_support) {
		printf("skipping rsocket call\n");
		goto realsock;
	}

	recursive = 1;
	ret = rsocket(domain, type, protocol);
	recursive = 0;
	if (ret >= 0) {
		if (fork_support) {
			rclose(ret);
realsock:
			ret = real.socket(domain, type, protocol);

With the above hack, I no longer see hangs when running with fork() and the netperf performance is split.  This should delay libibverbs initializing until after fork() has been called.

I'm trying to decide whether to keep this work-around or find a better solution.

- Sean

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                                     ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-23  0:30                                       ` Sridhar Samudrala
       [not found]                                         ` <1345681858.25565.12.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
  2012-08-24 18:22                                       ` Roland Dreier
  1 sibling, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-23  0:30 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 2012-08-22 at 23:35 +0000, Hefty, Sean wrote:
> I'm haven't identified the specific problem with fork support, but I did see this in libmlx4:
> 
> mlx4_alloc_context()
> {
> 	...
> 	context->uar = mmap(NULL, to_mdev(ibdev)->page_size, PROT_WRITE,
> 			    MAP_SHARED, cmd_fd, 0);
> 	if (context->uar == MAP_FAILED)
> 		goto err_free;
> 
> 	if (resp.bf_reg_size) {
> 		context->bf_page = mmap(NULL, to_mdev(ibdev)->page_size,
> 					PROT_WRITE, MAP_SHARED, cmd_fd,
> 					to_mdev(ibdev)->page_size);
> 	...
> }
> 
> I don't know for certain that these mmap() calls cause an issue, but the preload library socket() function calls rsocket(), which loads and initializes libibverbs.  This calls mlx4_alloc_context() before fork() has been called.
> 
> I added the following hack to the preload socket() call, which skips calling rsocket().
> 
> 	if ((domain == PF_INET || domain == PF_INET6) &&
> 	    (type == SOCK_STREAM) && (!protocol || protocol == IPPROTO_TCP) && fork_support) {
> 		printf("skipping rsocket call\n");
> 		goto realsock;
> 	}
> 
> 	recursive = 1;
> 	ret = rsocket(domain, type, protocol);
> 	recursive = 0;
> 	if (ret >= 0) {
> 		if (fork_support) {
> 			rclose(ret);
> realsock:
> 			ret = real.socket(domain, type, protocol);
> 
> With the above hack, I no longer see hangs when running with fork() and the netperf performance is split.  This should delay libibverbs initializing until after fork() has been called.

Yes. This fixes the hangs i am seeing with netperf too.

I saw this code in preload library and was wondering why rsocket() is called
and closed immediately if fork_support is enabled. I guess you are doing this
so that you can fallback to real socket at the initial socket() call instead 
of waiting all the way until fork_active/fork_passive. This should be OK
specifically if a user is using preload library and enabling fork_support.

This doesn't look like a hack to me. However it looks like a bug if rsocket()
followed by rclose() doesn't cleanup all the resources correctly.

Thanks
Sridhar

> I'm trying to decide whether to keep this work-around or find a better solution.
> 
> - Sean


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                                         ` <1345681858.25565.12.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-23  1:32                                           ` Hefty, Sean
  0 siblings, 0 replies; 19+ messages in thread
From: Hefty, Sean @ 2012-08-23  1:32 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org

> I saw this code in preload library and was wondering why rsocket() is called
> and closed immediately if fork_support is enabled. I guess you are doing this
> so that you can fallback to real socket at the initial socket() call instead
> of waiting all the way until fork_active/fork_passive. This should be OK
> specifically if a user is using preload library and enabling fork_support.

The initial call to rsocket() is made to see if that socket type is enabled.  This way I don't need to update both rsocket and the preload library when new socket types are supported.  So, yes, it allows us to fallback to a normal socket quickly if rsockets does not support the requested type.

> This doesn't look like a hack to me. However it looks like a bug if rsocket()
> followed by rclose() doesn't cleanup all the resources correctly.

I can come up with something a little cleaner, which I think would work okay.  I'm just wondering if this isn't really an issue with libmlx4 supporting fork().  I just need some way to confirm that this is the cause of the hangs.

Regardless, I'll try to push in a patch by the end of the week with this fix.

Thanks for your help!

- Sean

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rsockets and fork
       [not found]                                     ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2012-08-23  0:30                                       ` Sridhar Samudrala
@ 2012-08-24 18:22                                       ` Roland Dreier
       [not found]                                         ` <CAL1RGDW+QSXnfa5yvEsb7XqXMVbVVU6sz7BEKRPmc0tSYi8m9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Roland Dreier @ 2012-08-24 18:22 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Sridhar Samudrala,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, Aug 22, 2012 at 4:35 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> I'm haven't identified the specific problem with fork support, but I did see this in libmlx4:
>
> mlx4_alloc_context()
> {
>         ...
>         context->uar = mmap(NULL, to_mdev(ibdev)->page_size, PROT_WRITE,
>                             MAP_SHARED, cmd_fd, 0);
>         if (context->uar == MAP_FAILED)
>                 goto err_free;
>
>         if (resp.bf_reg_size) {
>                 context->bf_page = mmap(NULL, to_mdev(ibdev)->page_size,
>                                         PROT_WRITE, MAP_SHARED, cmd_fd,
>                                         to_mdev(ibdev)->page_size);
>         ...
> }
>
> I don't know for certain that these mmap() calls cause an issue, but the preload library socket() function calls rsocket(), which loads and initializes libibverbs.  This calls mlx4_alloc_context() before fork() has been called.

I don't think those mmap()s should be an issue with fork.... they are
mapping adapter PCI space into userspace, but it should work across
fork.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                                         ` <CAL1RGDW+QSXnfa5yvEsb7XqXMVbVVU6sz7BEKRPmc0tSYi8m9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-08-24 18:35                                           ` Hefty, Sean
       [not found]                                             ` <1828884A29C6694DAF28B7E6B8A8237346A8ADD4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-24 18:35 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sridhar Samudrala,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> I don't think those mmap()s should be an issue with fork.... they are
> mapping adapter PCI space into userspace, but it should work across
> fork.

makes sense

Do you have any ideas on ways to identify what in the initialization paths might cause the problems?  (Assuming that is where the problem is.)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rsockets and fork
       [not found]                                             ` <1828884A29C6694DAF28B7E6B8A8237346A8ADD4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-11-29 16:09                                               ` Or Gerlitz
       [not found]                                                 ` <CAJZOPZJWzKZ3fjUsbBVA8-Dv+YTajJBStP2VzBw=VrfXHSi=xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Or Gerlitz @ 2012-11-29 16:09 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Roland Dreier, Sridhar Samudrala,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, Aug 24, 2012 at 8:35 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> I don't think those mmap()s should be an issue with fork.... they are
>> mapping adapter PCI space into userspace, but it should work across fork.

> makes sense
> Do you have any ideas on ways to identify what in the initialization paths might cause the problems?  (Assuming that is where the problem is.)

Hi Sean,

Was there progress on using rsocket under netperf or fork in general?

Do you think for the netperf and maybe other uses case of fork rsockets can use
CM redirect (e.g through the rdma-cm supporting rdma_redirect - within
the server to bypass the fork limitation?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: rsockets and fork
       [not found]                                                 ` <CAJZOPZJWzKZ3fjUsbBVA8-Dv+YTajJBStP2VzBw=VrfXHSi=xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-29 16:27                                                   ` Hefty, Sean
       [not found]                                                     ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-11-29 16:27 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Roland Dreier, Sridhar Samudrala,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> Was there progress on using rsocket under netperf or fork in general?

Fork support in rsockets is available now, and netperf should work.  It's not generic enough to handle any arbitrary call to fork, but will work if the app does something like this:

listen()
s = accept()
fork(s)
 
> Do you think for the netperf and maybe other uses case of fork rsockets can use
> CM redirect (e.g through the rdma-cm supporting rdma_redirect - within
> the server to bypass the fork limitation?

The last problem I hit into was trying to support vsftp, which called chroot.  Handling chroot requires kernel changes.  But, between fork, fstat, chroot, dup2, epoll, and other calls, a kernel implementation of rsockets (or something similar) looks more appealing for apps that favor full compatibility over performance.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rsockets and fork
       [not found]                                                     ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-12-02 12:20                                                       ` Or Gerlitz
  2012-12-02 12:20                                                       ` Or Gerlitz
  1 sibling, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2012-12-02 12:20 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Sridhar Samudrala,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 29/11/2012 18:27, Hefty, Sean wrote:
> Fork support in rsockets is available now, and netperf should work.

Sean,

Trying the latest  librdamcm git (+ the patch I sent to have it build 
OK), we weren't able to run netperf

here's the server output
> starting netserver with host '192.168.20.126' port '12865' and family 
> AF_UNSPEC
> accept_connections: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset
> accept_connection: enter
> spawn_child: enter
> close_listens: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset

and this is the client output

> resolve_host called with host '192.168.20.126' port '(null)' family 
> AF_UNSPEC
> getaddrinfo returned the following for host '192.168.20.126' port 
> '(null)'  family AF_UNSPEC
>         cannonical name: '192.168.20.126'
>         flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 0 0 192 168 20 126 0 0 0 0 0 0 0 0 0 0
> scan_omni_args called with the following argument vector
> netperf -v 2 -d -H 192.168.20.126
> Program name: netperf
> Local send alignment: 8
> Local recv alignment: 8
> Remote send alignment: 8
> Remote recv alignment: 8
> Report local CPU 0
> Report remote CPU 0
> Verbosity: 2
> Debug: 1
> Port: 12865
> Test name: TCP_STREAM
> Test bytes: 0 Test time: 10 Test trans: 0
> Host name: 192.168.20.126
>
> installing catcher for all signals
> Could not install signal catcher for sig 32, errno 22
> Could not install signal catcher for sig 33, errno 22
> Could not install signal catcher for sig 65, errno 22
> remotehost is 192.168.20.126 and port 12865
> resolve_host called with host '192.168.20.126' port '12865' family AF_INET
> getaddrinfo returned the following for host '192.168.20.126' port 
> '12865'  family AF_INET
>         cannonical name: '192.168.20.126'
>         flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 50 65 192 168 20 126 0 0 0 0 0 0 0 
> 0 0 0
> resolve_host called with host '0.0.0.0' port '0' family AF_UNSPEC
> getaddrinfo returned the following for host '0.0.0.0' port '0' family 
> AF_UNSPEC
>         cannonical name: '0.0.0.0'
>         flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> establish_control called with host '192.168.20.126' port '12865' 
> remfam AF_INET
>                 local '0.0.0.0' port '0' locfam AF_UNSPEC
> bound control socket to 0.0.0.0 and 0
> successful connection to remote netserver at 192.168.20.126 and 12865
> recv_response: received a 0 byte response
> recv_response: partial response received: 0 bytes


We used netperf version is 2.5.0

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rsockets and fork
       [not found]                                                     ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2012-12-02 12:20                                                       ` Or Gerlitz
@ 2012-12-02 12:20                                                       ` Or Gerlitz
  1 sibling, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2012-12-02 12:20 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Eyal Salomon, Sridhar Samudrala,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 29/11/2012 18:27, Hefty, Sean wrote:
> Fork support in rsockets is available now, and netperf should work.

Sean,

Trying the latest  librdamcm git (+ the patch I sent to have it build 
OK), we weren't able to run netperf

here's the server output
> starting netserver with host '192.168.20.126' port '12865' and family 
> AF_UNSPEC
> accept_connections: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset
> accept_connection: enter
> spawn_child: enter
> close_listens: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset

and this is the client output

> resolve_host called with host '192.168.20.126' port '(null)' family 
> AF_UNSPEC
> getaddrinfo returned the following for host '192.168.20.126' port 
> '(null)'  family AF_UNSPEC
>         cannonical name: '192.168.20.126'
>         flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 0 0 192 168 20 126 0 0 0 0 0 0 0 0 0 0
> scan_omni_args called with the following argument vector
> netperf -v 2 -d -H 192.168.20.126
> Program name: netperf
> Local send alignment: 8
> Local recv alignment: 8
> Remote send alignment: 8
> Remote recv alignment: 8
> Report local CPU 0
> Report remote CPU 0
> Verbosity: 2
> Debug: 1
> Port: 12865
> Test name: TCP_STREAM
> Test bytes: 0 Test time: 10 Test trans: 0
> Host name: 192.168.20.126
>
> installing catcher for all signals
> Could not install signal catcher for sig 32, errno 22
> Could not install signal catcher for sig 33, errno 22
> Could not install signal catcher for sig 65, errno 22
> remotehost is 192.168.20.126 and port 12865
> resolve_host called with host '192.168.20.126' port '12865' family AF_INET
> getaddrinfo returned the following for host '192.168.20.126' port 
> '12865'  family AF_INET
>         cannonical name: '192.168.20.126'
>         flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 50 65 192 168 20 126 0 0 0 0 0 0 0 
> 0 0 0
> resolve_host called with host '0.0.0.0' port '0' family AF_UNSPEC
> getaddrinfo returned the following for host '0.0.0.0' port '0' family 
> AF_UNSPEC
>         cannonical name: '0.0.0.0'
>         flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol 
> IPPROTO_TCP addrlen 16
>         sa_family: AF_INET sadata: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> establish_control called with host '192.168.20.126' port '12865' 
> remfam AF_INET
>                 local '0.0.0.0' port '0' locfam AF_UNSPEC
> bound control socket to 0.0.0.0 and 0
> successful connection to remote netserver at 192.168.20.126 and 12865
> recv_response: received a 0 byte response
> recv_response: partial response received: 0 bytes


We used netperf version is 2.5.0

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-12-02 12:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-13 23:12 rsockets and fork Sridhar Samudrala
     [not found] ` <1344899557.2101.29.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-13 23:39   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A8237346A7DB36-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-14 17:51       ` Sridhar Samudrala
     [not found]         ` <502A9039.4090505-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-08-14 17:56           ` Hefty, Sean
     [not found]             ` <1828884A29C6694DAF28B7E6B8A8237346A89125-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-14 18:18               ` Sridhar Samudrala
     [not found]                 ` <1344968289.2101.36.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-16 23:40                   ` Hefty, Sean
     [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237346A89AEB-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-17 21:40                       ` Sridhar Samudrala
     [not found]                         ` <1345239625.1128.20.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-17 22:17                           ` Hefty, Sean
     [not found]                             ` <1828884A29C6694DAF28B7E6B8A8237346A89F70-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-17 22:36                               ` Hefty, Sean
     [not found]                                 ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-18  0:34                                   ` Sridhar Samudrala
2012-08-22 23:35                                   ` Hefty, Sean
     [not found]                                     ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-23  0:30                                       ` Sridhar Samudrala
     [not found]                                         ` <1345681858.25565.12.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-23  1:32                                           ` Hefty, Sean
2012-08-24 18:22                                       ` Roland Dreier
     [not found]                                         ` <CAL1RGDW+QSXnfa5yvEsb7XqXMVbVVU6sz7BEKRPmc0tSYi8m9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-24 18:35                                           ` Hefty, Sean
     [not found]                                             ` <1828884A29C6694DAF28B7E6B8A8237346A8ADD4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-11-29 16:09                                               ` Or Gerlitz
     [not found]                                                 ` <CAJZOPZJWzKZ3fjUsbBVA8-Dv+YTajJBStP2VzBw=VrfXHSi=xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-29 16:27                                                   ` Hefty, Sean
     [not found]                                                     ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-12-02 12:20                                                       ` Or Gerlitz
2012-12-02 12:20                                                       ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox