* rsockets and fork
@ 2012-08-13 23:12 Sridhar Samudrala
[not found] ` <1344899557.2101.29.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-13 23:12 UTC (permalink / raw)
To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Sean,
I could not get fork enabled netperf to work with rsockets in the latest
librdmacm git repository.
After some debugging, i found that the child netserver process is blocked at
sem_wait() call in fork_passive().
It is not clear to me how this call is supposed to unblock as sem_post()
is done later in the same function.
If i comment out sem_wait() and sem_post() in this routine, i got a single
instance of netperf working with a forked netserver.
However, if i start another netperf instance while the other session is
still going on, it seems to hang and return with a very low throughput.
It looks as if the first session is starving all the other sessions.
The right behavior would be to have the available bw split across the
parallel instances.
Thanks
Sridhar
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1344899557.2101.29.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-13 23:39 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A7DB36-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-13 23:39 UTC (permalink / raw)
To: Sridhar Samudrala,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> I could not get fork enabled netperf to work with rsockets in the latest
> librdmacm git repository.
> After some debugging, i found that the child netserver process is blocked at
> sem_wait() call in fork_passive().
> It is not clear to me how this call is supposed to unblock as sem_post()
> is done later in the same function.
sem_open() should create the semaphore with an initial value of 1. The sem_wait/sem_post calls serialize listening on the corresponding rsocket. The name semaphore should remain until the system is rebooted.
> However, if i start another netperf instance while the other session is
> still going on, it seems to hang and return with a very low throughput.
> It looks as if the first session is starving all the other sessions.
> The right behavior would be to have the available bw split across the
> parallel instances.
Can you verify that the second session is using rsockets and not falling back to a socket connection?
You can try adjusting the polling time. This can usually be done by writing a value (time in microseconds) to the following file:
/etc/rdma/rsocket/polling_time
By default, the value is 10.
- Sean
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A7DB36-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-14 17:51 ` Sridhar Samudrala
[not found] ` <502A9039.4090505-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-14 17:51 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 8/13/2012 4:39 PM, Hefty, Sean wrote:
>> I could not get fork enabled netperf to work with rsockets in the latest
>> librdmacm git repository.
>> After some debugging, i found that the child netserver process is blocked at
>> sem_wait() call in fork_passive().
>> It is not clear to me how this call is supposed to unblock as sem_post()
>> is done later in the same function.
> sem_open() should create the semaphore with an initial value of 1. The sem_wait/sem_post calls serialize listening on the corresponding rsocket. The name semaphore should remain until the system is rebooted.
Looks like a stale semaphore issue. I removed /dev/shm/sem.rsocket_fork
and restarted netserver and it is working now.
>
>> However, if i start another netperf instance while the other session is
>> still going on, it seems to hang and return with a very low throughput.
>> It looks as if the first session is starving all the other sessions.
>> The right behavior would be to have the available bw split across the
>> parallel instances.
> Can you verify that the second session is using rsockets and not falling back to a socket connection?
Yes. it is also using rsockets.
The second session always hangs after sending a fixed number of bytes
(38469632).
rsend() blocks waiting for the CQ event.
Thanks
Sridhar
>
> You can try adjusting the polling time. This can usually be done by writing a value (time in microseconds) to the following file:
>
> /etc/rdma/rsocket/polling_time
>
> By default, the value is 10.
>
> - Sean
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <502A9039.4090505-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-08-14 17:56 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89125-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-14 17:56 UTC (permalink / raw)
To: Sridhar Samudrala; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Yes. it is also using rsockets.
> The second session always hangs after sending a fixed number of bytes
> (38469632).
> rsend() blocks waiting for the CQ event.
Can you send me the parameters that you use for testing?
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89125-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-14 18:18 ` Sridhar Samudrala
[not found] ` <1344968289.2101.36.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-14 18:18 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, 2012-08-14 at 17:56 +0000, Hefty, Sean wrote:
> > Yes. it is also using rsockets.
> > The second session always hangs after sending a fixed number of bytes
> > (38469632).
> > rsend() blocks waiting for the CQ event.
>
> Can you send me the parameters that you use for testing?
This test is using Mellanox 10Gb RoCEE with MTU set to 9000
Server is started using
# ldr netserver -D
2 clients are started in 2 windows as follows.
# ldr netperf -v2 -c -C -H 192.168.0.22 -l10
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 65536 65536 10.00 9679.80 2.58 2.59 0.349 0.351
Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 12101877760 65536.00 184660 21967.51 550899
Maximum
Segment
Size (bytes)
4096
# ldr netperf -v2 -c -C -H 192.168.0.22 -l10
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 65536 65536 10.00 30.69 2.27 2.78 96.892 118.616
Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 38469632 65536.00 587 21360.15 1801
Maximum
Segment
Size (bytes)
4096
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1344968289.2101.36.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-16 23:40 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89AEB-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-16 23:40 UTC (permalink / raw)
To: Sridhar Samudrala; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2905 bytes --]
> This test is using Mellanox 10Gb RoCEE with MTU set to 9000
>
> Server is started using
> # ldr netserver -D
>
> 2 clients are started in 2 windows as follows.
>
> # ldr netperf -v2 -c -C -H 192.168.0.22 -l10
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0
> AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 65536 65536 10.00 9679.80 2.58 2.59 0.349 0.351
>
> Alignment Offset Bytes Bytes Sends Bytes Recvs
> Local Remote Local Remote Xfered Per Per
> Send Recv Send Recv Send (avg) Recv (avg)
> 8 8 0 0 12101877760 65536.00 184660 21967.51 550899
>
> Maximum
> Segment
> Size (bytes)
> 4096
>
> # ldr netperf -v2 -c -C -H 192.168.0.22 -l10
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 (192.168.0.22) port 0
> AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 65536 65536 10.00 30.69 2.27 2.78 96.892 118.616
>
> Alignment Offset Bytes Bytes Sends Bytes Recvs
> Local Remote Local Remote Xfered Per Per
> Send Recv Send Recv Send (avg) Recv (avg)
> 8 8 0 0 38469632 65536.00 587 21360.15 1801
>
> Maximum
> Segment
> Size (bytes)
> 4096
I don't have RoCE installed. With IB, I haven't been able to see this problem after dozens of attempts. The performance isn't divided equally, but I usually see between 3-10 Gbps out of each connection. I'm running with some additional patches, but I don't see where those would affect any hang.
During development, I've seen issues where specific transfer patterns just happen to fall on boundary conditions that result in hangs or slowness. Hopefully that's not the case because those are a pain to identify.
Is this something that you would be able to use a debugger on? Dumping the contents of the struct rsocket * would be useful in troubleshooting if we're waiting on credits, buffer space, the app, or something else.
- Sean
N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89AEB-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-17 21:40 ` Sridhar Samudrala
[not found] ` <1345239625.1128.20.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-17 21:40 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Thu, 2012-08-16 at 23:40 +0000, Hefty, Sean wrote:
> I don't have RoCE installed. With IB, I haven't been able to see this problem after dozens of attempts. The performance isn't divided equally, but I usually see between 3-10 Gbps out of each connection. I'm running with some additional patches, but I don't see where those would affect any hang.
>
> During development, I've seen issues where specific transfer patterns just happen to fall on boundary conditions that result in hangs or slowness. Hopefully that's not the case because those are a pain to identify.
>
> Is this something that you would be able to use a debugger on? Dumping the contents of the struct rsocket * would be useful in troubleshooting if we're waiting on credits, buffer space, the app, or something else.
>
Here are the dumps of struct rsocket of a hung netperf connection.
netserver: hang in recv()
(gdb) bt
#0 0x0000003286ed83f0 in __read_nocancel () from /lib64/libc.so.6
#1 0x0000003b7220a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
#2 0x00007fee00642197 in rs_get_cq_event (rs=0x18139c0) at src/rsocket.c:941
#3 0x00007fee00643567 in rs_process_cq (rs=0x18139c0, nonblock=0,
test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:989
#4 0x00007fee006438bd in rs_get_comp (rs=0x18139c0, nonblock=<value optimized out>,
test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:1019
#5 0x00007fee00643b51 in rrecv (socket=<value optimized out>, buf=0x17fdd10, len=87380,
flags=<value optimized out>) at src/rsocket.c:1136
#6 0x000000000042d6e3 in recv_data ()
#7 0x0000000000430da9 in recv_omni ()
#8 0x0000000000403628 in process_requests ()
#9 0x00000000004037b0 in spawn_child ()
#10 0x00000000004038f0 in accept_connection ()
#11 0x0000000000403a46 in accept_connections ()
#12 0x000000000040407a in main ()
(gdb) p *(struct rsocket *)0x18139c0
$1 = {cm_id = 0x1813b80, slock = {sem = {__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
__align = 549755813888}, cnt = 0}, rlock = {sem = {
__size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align = 549755813888}, cnt = 1},
cq_lock = {sem = {__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
__align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
__size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align = 549755813888}, cnt = 1},
opts = 0, fd_flags = 2, so_opts = 4, tcp_opts = 2, ipv6_opts = 0, state = 1792, cq_armed = 1,
retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020, sbuf_bytes_avail = 131072,
sseq_no = 0, sseq_comp = 1024, sq_size = 1024, sq_inline = 64, rq_size = 1024, rseq_no = 3515,
rseq_comp = 4026, rbuf_bytes_avail = 63488, rbuf_free_offset = 65536, rbuf_offset = 129024,
rmsg_head = 440, rmsg_tail = 440, rmsg = 0x1813dd0, remote_sge = 1, remote_sgl = {addr = 6784856,
key = 2550145917, length = 2}, target_mr = 0x1815e60, target_sge = 0, target_sgl = {{
addr = 140617248624656, key = 2550146173, length = 65536}, {addr = 140617248690192,
key = 2550146173, length = 65536}}, rbuf_size = 131072, rmr = 0x17dd680,
rbuf = 0x7fee00347010 "netperf", sbuf_size = 131072, smr = 0x17dffd0, ssgl = {{
addr = 140660182515728, length = 0, lkey = 3758105205}, {addr = 140660182515728, length = 0,
lkey = 3758105205}}, sbuf = 0x7fee00368010 ""}
netperf: hang in send()
(gdb) bt
#0 0x0000003120ad83f0 in __read_nocancel () from /lib64/libc.so.6
#1 0x0000003da700a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
#2 0x00007fe40157f197 in rs_get_cq_event (rs=0x678610) at src/rsocket.c:941
#3 0x00007fe401580567 in rs_process_cq (rs=0x678610, nonblock=0,
test=0x7fe40157e430 <rs_conn_can_send>) at src/rsocket.c:989
#4 0x00007fe4015808bd in rs_get_comp (rs=0x678610,
nonblock=<value optimized out>, test=0x7fe40157e430 <rs_conn_can_send>)
at src/rsocket.c:1019
#5 0x00007fe401581b92 in rsend (socket=<value optimized out>,
buf=<value optimized out>, len=65536, flags=<value optimized out>)
at src/rsocket.c:1244
#6 0x000000000042ba6d in send_data ()
#7 0x000000000042d158 in send_omni_inner ()
#8 0x000000000042fbf1 in send_tcp_stream ()
#9 0x000000000040239d in main ()
(gdb) p *(struct rsocket *)0x678610
$1 = {cm_id = 0x6a2af0, slock = {sem = {
__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
__align = 549755813888}, cnt = 1}, rlock = {sem = {
__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
__align = 549755813888}, cnt = 0}, cq_lock = {sem = {
__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
__align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
__align = 549755813888}, cnt = 1}, opts = 0, fd_flags = 2, so_opts = 4,
tcp_opts = 0, ipv6_opts = 0, state = 1792, cq_armed = 1, retries = 0, err = 0,
index = 12, ctrl_avail = 4, sqe_avail = 1020, sbuf_bytes_avail = 131072,
sseq_no = 3522, sseq_comp = 4538, sq_size = 1024, sq_inline = 64,
rq_size = 1024, rseq_no = 0, rseq_comp = 512, rbuf_bytes_avail = 0,
rbuf_free_offset = 0, rbuf_offset = 0, rmsg_head = 0, rmsg_tail = 0,
rmsg = 0x6a2d90, remote_sge = 0, remote_sgl = {addr = 25246472,
key = 3758105461, length = 2}, target_mr = 0x6a4e20, target_sge = 1,
target_sgl = {{addr = 140660182446096, key = 3758105717, length = 0}, {
addr = 140660182511632, key = 3758105717, length = 0}}, rbuf_size = 131072,
rmr = 0x677c80, rbuf = 0x7fe401275010 "", sbuf_size = 131072, smr = 0x678980,
ssgl = {{addr = 140617248825360, length = 2048, lkey = 2550145661}, {
addr = 140617248759824, length = 0, lkey = 2550145661}},
sbuf = 0x7fe401296010 "netperf"}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1345239625.1128.20.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-17 22:17 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89F70-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-17 22:17 UTC (permalink / raw)
To: Sridhar Samudrala; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
thanks - this helps ... a little
The sender is waiting for the receiver to publish additional receive buffer space.
> (gdb) bt
> #0 0x0000003286ed83f0 in __read_nocancel () from /lib64/libc.so.6
> #1 0x0000003b7220a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
> #2 0x00007fee00642197 in rs_get_cq_event (rs=0x18139c0) at src/rsocket.c:941
> #3 0x00007fee00643567 in rs_process_cq (rs=0x18139c0, nonblock=0,
> test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:989
> #4 0x00007fee006438bd in rs_get_comp (rs=0x18139c0, nonblock=<value optimized
> out>,
> test=0x7fee006414b0 <rs_conn_have_rdata>) at src/rsocket.c:1019
> #5 0x00007fee00643b51 in rrecv (socket=<value optimized out>, buf=0x17fdd10,
> len=87380,
> flags=<value optimized out>) at src/rsocket.c:1136
> #6 0x000000000042d6e3 in recv_data ()
> #7 0x0000000000430da9 in recv_omni ()
> #8 0x0000000000403628 in process_requests ()
> #9 0x00000000004037b0 in spawn_child ()
> #10 0x00000000004038f0 in accept_connection ()
> #11 0x0000000000403a46 in accept_connections ()
> #12 0x000000000040407a in main ()
> (gdb) p *(struct rsocket *)0x18139c0
> $1 = {cm_id = 0x1813b80, slock = {sem = {__size = "\000\000\000\000\200",
> '\000' <repeats 26 times>,
> __align = 549755813888}, cnt = 0}, rlock = {sem = {
> __size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align =
> 549755813888}, cnt = 1},
> cq_lock = {sem = {__size = "\000\000\000\000\200", '\000' <repeats 26 times>,
> __align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
> __size = "\000\000\000\000\200", '\000' <repeats 26 times>, __align =
> 549755813888}, cnt = 1},
> opts = 0, fd_flags = 2, so_opts = 4, tcp_opts = 2, ipv6_opts = 0, state =
> 1792, cq_armed = 1,
> retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020,
^^^^^^^^^^^^^^
This looks like part of the problem. There should be 4 control messages available by default. The receiver has sent a control message to the sender, but the control message never completed at the receiver. My guess is that the sender never received it either. It may be the missing buffer update, which would be an RDMA write into the sender's 'target_sgl' array.
Have you ever been able to reproduce this problem without using fork() support?
> sbuf_bytes_avail = 131072,
> sseq_no = 0, sseq_comp = 1024, sq_size = 1024, sq_inline = 64, rq_size =
> 1024, rseq_no = 3515,
> rseq_comp = 4026, rbuf_bytes_avail = 63488, rbuf_free_offset = 65536,
> rbuf_offset = 129024,
> rmsg_head = 440, rmsg_tail = 440, rmsg = 0x1813dd0, remote_sge = 1,
> remote_sgl = {addr = 6784856,
> key = 2550145917, length = 2}, target_mr = 0x1815e60, target_sge = 0,
> target_sgl = {{
> addr = 140617248624656, key = 2550146173, length = 65536}, {addr =
> 140617248690192,
> key = 2550146173, length = 65536}}, rbuf_size = 131072, rmr = 0x17dd680,
> rbuf = 0x7fee00347010 "netperf", sbuf_size = 131072, smr = 0x17dffd0, ssgl =
> {{
> addr = 140660182515728, length = 0, lkey = 3758105205}, {addr =
> 140660182515728, length = 0,
> lkey = 3758105205}}, sbuf = 0x7fee00368010 ""}
>
> netperf: hang in send()
> (gdb) bt
> #0 0x0000003120ad83f0 in __read_nocancel () from /lib64/libc.so.6
> #1 0x0000003da700a1c4 in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
> #2 0x00007fe40157f197 in rs_get_cq_event (rs=0x678610) at src/rsocket.c:941
> #3 0x00007fe401580567 in rs_process_cq (rs=0x678610, nonblock=0,
> test=0x7fe40157e430 <rs_conn_can_send>) at src/rsocket.c:989
> #4 0x00007fe4015808bd in rs_get_comp (rs=0x678610,
> nonblock=<value optimized out>, test=0x7fe40157e430 <rs_conn_can_send>)
> at src/rsocket.c:1019
> #5 0x00007fe401581b92 in rsend (socket=<value optimized out>,
> buf=<value optimized out>, len=65536, flags=<value optimized out>)
> at src/rsocket.c:1244
> #6 0x000000000042ba6d in send_data ()
> #7 0x000000000042d158 in send_omni_inner ()
> #8 0x000000000042fbf1 in send_tcp_stream ()
> #9 0x000000000040239d in main ()
> (gdb) p *(struct rsocket *)0x678610
> $1 = {cm_id = 0x6a2af0, slock = {sem = {
> __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
> __align = 549755813888}, cnt = 1}, rlock = {sem = {
> __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
> __align = 549755813888}, cnt = 0}, cq_lock = {sem = {
> __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
> __align = 549755813888}, cnt = 0}, cq_wait_lock = {sem = {
> __size = "\000\000\000\000\200", '\000' <repeats 26 times>,
> __align = 549755813888}, cnt = 1}, opts = 0, fd_flags = 2, so_opts = 4,
> tcp_opts = 0, ipv6_opts = 0, state = 1792, cq_armed = 1, retries = 0, err =
> 0,
> index = 12, ctrl_avail = 4, sqe_avail = 1020, sbuf_bytes_avail = 131072,
> sseq_no = 3522, sseq_comp = 4538, sq_size = 1024, sq_inline = 64,
> rq_size = 1024, rseq_no = 0, rseq_comp = 512, rbuf_bytes_avail = 0,
> rbuf_free_offset = 0, rbuf_offset = 0, rmsg_head = 0, rmsg_tail = 0,
> rmsg = 0x6a2d90, remote_sge = 0, remote_sgl = {addr = 25246472,
> key = 3758105461, length = 2}, target_mr = 0x6a4e20, target_sge = 1,
> target_sgl = {{addr = 140660182446096, key = 3758105717, length = 0}, {
> addr = 140660182511632, key = 3758105717, length = 0}}, rbuf_size =
The sender is stuck waiting for the target_sgl to be updated. One of the target_sgl entries needs a length > 0.
> 131072,
> rmr = 0x677c80, rbuf = 0x7fe401275010 "", sbuf_size = 131072, smr = 0x678980,
> ssgl = {{addr = 140617248825360, length = 2048, lkey = 2550145661}, {
> addr = 140617248759824, length = 0, lkey = 2550145661}},
> sbuf = 0x7fe401296010 "netperf"}
>
I'll see if I can find anything in the code that would account for this. I don't understand why we're missing a completion or what causes this to occur after sending a specific number of bytes (38469632). Although that number of bytes is evenly divisible by 64k, which is half the size of the send/receive buffers, which puts us on a boundary condition. :/
- Sean
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89F70-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-17 22:36 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-17 22:36 UTC (permalink / raw)
To: Hefty, Sean, Sridhar Samudrala
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020,
>
> ^^^^^^^^^^^^^^
> This looks like part of the problem. There should be 4 control messages
> available by default. The receiver has sent a control message to the sender,
> but the control message never completed at the receiver. My guess is that the
> sender never received it either. It may be the missing buffer update, which
> would be an RDMA write into the sender's 'target_sgl' array.
>
> Have you ever been able to reproduce this problem without using fork() support?
A simple check to add is whether the rs_post_write() call ever fails, but in rs_send_credits() in particular. Checking for an error from ibv_get_cq_event() in rs_get_cq_event() may also be useful. In neither case do I expect an error, but we could confirm it.
- Sean
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-18 0:34 ` Sridhar Samudrala
2012-08-22 23:35 ` Hefty, Sean
1 sibling, 0 replies; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-18 0:34 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Fri, 2012-08-17 at 22:36 +0000, Hefty, Sean wrote:
> > > retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020,
> >
> > ^^^^^^^^^^^^^^
> > This looks like part of the problem. There should be 4 control messages
> > available by default. The receiver has sent a control message to the sender,
> > but the control message never completed at the receiver. My guess is that the
> > sender never received it either. It may be the missing buffer update, which
> > would be an RDMA write into the sender's 'target_sgl' array.
> >
> > Have you ever been able to reproduce this problem without using fork() support?
>
> A simple check to add is whether the rs_post_write() call ever fails, but in rs_send_credits() in particular. Checking for an error from ibv_get_cq_event() in rs_get_cq_event() may also be useful. In neither case do I expect an error, but we could confirm it.
I didn't see any errors with these calls.
So far i have been able to consistently reproduce this hang only when fork support is enabled.
I am seeing this issue randomly even with a simple tcp client/server test only with fork.
Thanks
Sridhar
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-18 0:34 ` Sridhar Samudrala
@ 2012-08-22 23:35 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
1 sibling, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-22 23:35 UTC (permalink / raw)
To: Sridhar Samudrala, roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
I'm haven't identified the specific problem with fork support, but I did see this in libmlx4:
mlx4_alloc_context()
{
...
context->uar = mmap(NULL, to_mdev(ibdev)->page_size, PROT_WRITE,
MAP_SHARED, cmd_fd, 0);
if (context->uar == MAP_FAILED)
goto err_free;
if (resp.bf_reg_size) {
context->bf_page = mmap(NULL, to_mdev(ibdev)->page_size,
PROT_WRITE, MAP_SHARED, cmd_fd,
to_mdev(ibdev)->page_size);
...
}
I don't know for certain that these mmap() calls cause an issue, but the preload library socket() function calls rsocket(), which loads and initializes libibverbs. This calls mlx4_alloc_context() before fork() has been called.
I added the following hack to the preload socket() call, which skips calling rsocket().
if ((domain == PF_INET || domain == PF_INET6) &&
(type == SOCK_STREAM) && (!protocol || protocol == IPPROTO_TCP) && fork_support) {
printf("skipping rsocket call\n");
goto realsock;
}
recursive = 1;
ret = rsocket(domain, type, protocol);
recursive = 0;
if (ret >= 0) {
if (fork_support) {
rclose(ret);
realsock:
ret = real.socket(domain, type, protocol);
With the above hack, I no longer see hangs when running with fork() and the netperf performance is split. This should delay libibverbs initializing until after fork() has been called.
I'm trying to decide whether to keep this work-around or find a better solution.
- Sean
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-08-23 0:30 ` Sridhar Samudrala
[not found] ` <1345681858.25565.12.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-24 18:22 ` Roland Dreier
1 sibling, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-08-23 0:30 UTC (permalink / raw)
To: Hefty, Sean
Cc: roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Wed, 2012-08-22 at 23:35 +0000, Hefty, Sean wrote:
> I'm haven't identified the specific problem with fork support, but I did see this in libmlx4:
>
> mlx4_alloc_context()
> {
> ...
> context->uar = mmap(NULL, to_mdev(ibdev)->page_size, PROT_WRITE,
> MAP_SHARED, cmd_fd, 0);
> if (context->uar == MAP_FAILED)
> goto err_free;
>
> if (resp.bf_reg_size) {
> context->bf_page = mmap(NULL, to_mdev(ibdev)->page_size,
> PROT_WRITE, MAP_SHARED, cmd_fd,
> to_mdev(ibdev)->page_size);
> ...
> }
>
> I don't know for certain that these mmap() calls cause an issue, but the preload library socket() function calls rsocket(), which loads and initializes libibverbs. This calls mlx4_alloc_context() before fork() has been called.
>
> I added the following hack to the preload socket() call, which skips calling rsocket().
>
> if ((domain == PF_INET || domain == PF_INET6) &&
> (type == SOCK_STREAM) && (!protocol || protocol == IPPROTO_TCP) && fork_support) {
> printf("skipping rsocket call\n");
> goto realsock;
> }
>
> recursive = 1;
> ret = rsocket(domain, type, protocol);
> recursive = 0;
> if (ret >= 0) {
> if (fork_support) {
> rclose(ret);
> realsock:
> ret = real.socket(domain, type, protocol);
>
> With the above hack, I no longer see hangs when running with fork() and the netperf performance is split. This should delay libibverbs initializing until after fork() has been called.
Yes. This fixes the hangs i am seeing with netperf too.
I saw this code in preload library and was wondering why rsocket() is called
and closed immediately if fork_support is enabled. I guess you are doing this
so that you can fallback to real socket at the initial socket() call instead
of waiting all the way until fork_active/fork_passive. This should be OK
specifically if a user is using preload library and enabling fork_support.
This doesn't look like a hack to me. However it looks like a bug if rsocket()
followed by rclose() doesn't cleanup all the resources correctly.
Thanks
Sridhar
> I'm trying to decide whether to keep this work-around or find a better solution.
>
> - Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <1345681858.25565.12.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
@ 2012-08-23 1:32 ` Hefty, Sean
0 siblings, 0 replies; 19+ messages in thread
From: Hefty, Sean @ 2012-08-23 1:32 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> I saw this code in preload library and was wondering why rsocket() is called
> and closed immediately if fork_support is enabled. I guess you are doing this
> so that you can fallback to real socket at the initial socket() call instead
> of waiting all the way until fork_active/fork_passive. This should be OK
> specifically if a user is using preload library and enabling fork_support.
The initial call to rsocket() is made to see if that socket type is enabled. This way I don't need to update both rsocket and the preload library when new socket types are supported. So, yes, it allows us to fallback to a normal socket quickly if rsockets does not support the requested type.
> This doesn't look like a hack to me. However it looks like a bug if rsocket()
> followed by rclose() doesn't cleanup all the resources correctly.
I can come up with something a little cleaner, which I think would work okay. I'm just wondering if this isn't really an issue with libmlx4 supporting fork(). I just need some way to confirm that this is the cause of the hangs.
Regardless, I'll try to push in a patch by the end of the week with this fix.
Thanks for your help!
- Sean
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-23 0:30 ` Sridhar Samudrala
@ 2012-08-24 18:22 ` Roland Dreier
[not found] ` <CAL1RGDW+QSXnfa5yvEsb7XqXMVbVVU6sz7BEKRPmc0tSYi8m9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 1 reply; 19+ messages in thread
From: Roland Dreier @ 2012-08-24 18:22 UTC (permalink / raw)
To: Hefty, Sean
Cc: Sridhar Samudrala,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Wed, Aug 22, 2012 at 4:35 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> I'm haven't identified the specific problem with fork support, but I did see this in libmlx4:
>
> mlx4_alloc_context()
> {
> ...
> context->uar = mmap(NULL, to_mdev(ibdev)->page_size, PROT_WRITE,
> MAP_SHARED, cmd_fd, 0);
> if (context->uar == MAP_FAILED)
> goto err_free;
>
> if (resp.bf_reg_size) {
> context->bf_page = mmap(NULL, to_mdev(ibdev)->page_size,
> PROT_WRITE, MAP_SHARED, cmd_fd,
> to_mdev(ibdev)->page_size);
> ...
> }
>
> I don't know for certain that these mmap() calls cause an issue, but the preload library socket() function calls rsocket(), which loads and initializes libibverbs. This calls mlx4_alloc_context() before fork() has been called.
I don't think those mmap()s should be an issue with fork.... they are
mapping adapter PCI space into userspace, but it should work across
fork.
- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <CAL1RGDW+QSXnfa5yvEsb7XqXMVbVVU6sz7BEKRPmc0tSYi8m9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-08-24 18:35 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A8ADD4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-08-24 18:35 UTC (permalink / raw)
To: Roland Dreier
Cc: Sridhar Samudrala,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> I don't think those mmap()s should be an issue with fork.... they are
> mapping adapter PCI space into userspace, but it should work across
> fork.
makes sense
Do you have any ideas on ways to identify what in the initialization paths might cause the problems? (Assuming that is where the problem is.)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A8ADD4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-11-29 16:09 ` Or Gerlitz
[not found] ` <CAJZOPZJWzKZ3fjUsbBVA8-Dv+YTajJBStP2VzBw=VrfXHSi=xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Or Gerlitz @ 2012-11-29 16:09 UTC (permalink / raw)
To: Hefty, Sean
Cc: Roland Dreier, Sridhar Samudrala,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Fri, Aug 24, 2012 at 8:35 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> I don't think those mmap()s should be an issue with fork.... they are
>> mapping adapter PCI space into userspace, but it should work across fork.
> makes sense
> Do you have any ideas on ways to identify what in the initialization paths might cause the problems? (Assuming that is where the problem is.)
Hi Sean,
Was there progress on using rsocket under netperf or fork in general?
Do you think for the netperf and maybe other uses case of fork rsockets can use
CM redirect (e.g through the rdma-cm supporting rdma_redirect - within
the server to bypass the fork limitation?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: rsockets and fork
[not found] ` <CAJZOPZJWzKZ3fjUsbBVA8-Dv+YTajJBStP2VzBw=VrfXHSi=xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-29 16:27 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Hefty, Sean @ 2012-11-29 16:27 UTC (permalink / raw)
To: Or Gerlitz
Cc: Roland Dreier, Sridhar Samudrala,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Was there progress on using rsocket under netperf or fork in general?
Fork support in rsockets is available now, and netperf should work. It's not generic enough to handle any arbitrary call to fork, but will work if the app does something like this:
listen()
s = accept()
fork(s)
> Do you think for the netperf and maybe other uses case of fork rsockets can use
> CM redirect (e.g through the rdma-cm supporting rdma_redirect - within
> the server to bypass the fork limitation?
The last problem I hit into was trying to support vsftp, which called chroot. Handling chroot requires kernel changes. But, between fork, fstat, chroot, dup2, epoll, and other calls, a kernel implementation of rsockets (or something similar) looks more appealing for apps that favor full compatibility over performance.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-12-02 12:20 ` Or Gerlitz
2012-12-02 12:20 ` Or Gerlitz
1 sibling, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2012-12-02 12:20 UTC (permalink / raw)
To: Hefty, Sean
Cc: Sridhar Samudrala,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 29/11/2012 18:27, Hefty, Sean wrote:
> Fork support in rsockets is available now, and netperf should work.
Sean,
Trying the latest librdamcm git (+ the patch I sent to have it build
OK), we weren't able to run netperf
here's the server output
> starting netserver with host '192.168.20.126' port '12865' and family
> AF_UNSPEC
> accept_connections: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset
> accept_connection: enter
> spawn_child: enter
> close_listens: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset
and this is the client output
> resolve_host called with host '192.168.20.126' port '(null)' family
> AF_UNSPEC
> getaddrinfo returned the following for host '192.168.20.126' port
> '(null)' family AF_UNSPEC
> cannonical name: '192.168.20.126'
> flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol
> IPPROTO_TCP addrlen 16
> sa_family: AF_INET sadata: 0 0 192 168 20 126 0 0 0 0 0 0 0 0 0 0
> scan_omni_args called with the following argument vector
> netperf -v 2 -d -H 192.168.20.126
> Program name: netperf
> Local send alignment: 8
> Local recv alignment: 8
> Remote send alignment: 8
> Remote recv alignment: 8
> Report local CPU 0
> Report remote CPU 0
> Verbosity: 2
> Debug: 1
> Port: 12865
> Test name: TCP_STREAM
> Test bytes: 0 Test time: 10 Test trans: 0
> Host name: 192.168.20.126
>
> installing catcher for all signals
> Could not install signal catcher for sig 32, errno 22
> Could not install signal catcher for sig 33, errno 22
> Could not install signal catcher for sig 65, errno 22
> remotehost is 192.168.20.126 and port 12865
> resolve_host called with host '192.168.20.126' port '12865' family AF_INET
> getaddrinfo returned the following for host '192.168.20.126' port
> '12865' family AF_INET
> cannonical name: '192.168.20.126'
> flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol
> IPPROTO_TCP addrlen 16
> sa_family: AF_INET sadata: 50 65 192 168 20 126 0 0 0 0 0 0 0
> 0 0 0
> resolve_host called with host '0.0.0.0' port '0' family AF_UNSPEC
> getaddrinfo returned the following for host '0.0.0.0' port '0' family
> AF_UNSPEC
> cannonical name: '0.0.0.0'
> flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol
> IPPROTO_TCP addrlen 16
> sa_family: AF_INET sadata: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> establish_control called with host '192.168.20.126' port '12865'
> remfam AF_INET
> local '0.0.0.0' port '0' locfam AF_UNSPEC
> bound control socket to 0.0.0.0 and 0
> successful connection to remote netserver at 192.168.20.126 and 12865
> recv_response: received a 0 byte response
> recv_response: partial response received: 0 bytes
We used netperf version is 2.5.0
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rsockets and fork
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-12-02 12:20 ` Or Gerlitz
@ 2012-12-02 12:20 ` Or Gerlitz
1 sibling, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2012-12-02 12:20 UTC (permalink / raw)
To: Hefty, Sean
Cc: Eyal Salomon, Sridhar Samudrala,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 29/11/2012 18:27, Hefty, Sean wrote:
> Fork support in rsockets is available now, and netperf should work.
Sean,
Trying the latest librdamcm git (+ the patch I sent to have it build
OK), we weren't able to run netperf
here's the server output
> starting netserver with host '192.168.20.126' port '12865' and family
> AF_UNSPEC
> accept_connections: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset
> accept_connection: enter
> spawn_child: enter
> close_listens: enter
> set_fdset: enter list 0x22ea840 fd_set 0x7fff7134e040
> setting 3 in fdset
and this is the client output
> resolve_host called with host '192.168.20.126' port '(null)' family
> AF_UNSPEC
> getaddrinfo returned the following for host '192.168.20.126' port
> '(null)' family AF_UNSPEC
> cannonical name: '192.168.20.126'
> flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol
> IPPROTO_TCP addrlen 16
> sa_family: AF_INET sadata: 0 0 192 168 20 126 0 0 0 0 0 0 0 0 0 0
> scan_omni_args called with the following argument vector
> netperf -v 2 -d -H 192.168.20.126
> Program name: netperf
> Local send alignment: 8
> Local recv alignment: 8
> Remote send alignment: 8
> Remote recv alignment: 8
> Report local CPU 0
> Report remote CPU 0
> Verbosity: 2
> Debug: 1
> Port: 12865
> Test name: TCP_STREAM
> Test bytes: 0 Test time: 10 Test trans: 0
> Host name: 192.168.20.126
>
> installing catcher for all signals
> Could not install signal catcher for sig 32, errno 22
> Could not install signal catcher for sig 33, errno 22
> Could not install signal catcher for sig 65, errno 22
> remotehost is 192.168.20.126 and port 12865
> resolve_host called with host '192.168.20.126' port '12865' family AF_INET
> getaddrinfo returned the following for host '192.168.20.126' port
> '12865' family AF_INET
> cannonical name: '192.168.20.126'
> flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol
> IPPROTO_TCP addrlen 16
> sa_family: AF_INET sadata: 50 65 192 168 20 126 0 0 0 0 0 0 0
> 0 0 0
> resolve_host called with host '0.0.0.0' port '0' family AF_UNSPEC
> getaddrinfo returned the following for host '0.0.0.0' port '0' family
> AF_UNSPEC
> cannonical name: '0.0.0.0'
> flags: 22 family: AF_INET: socktype: SOCK_STREAM protocol
> IPPROTO_TCP addrlen 16
> sa_family: AF_INET sadata: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> establish_control called with host '192.168.20.126' port '12865'
> remfam AF_INET
> local '0.0.0.0' port '0' locfam AF_UNSPEC
> bound control socket to 0.0.0.0 and 0
> successful connection to remote netserver at 192.168.20.126 and 12865
> recv_response: received a 0 byte response
> recv_response: partial response received: 0 bytes
We used netperf version is 2.5.0
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2012-12-02 12:20 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-13 23:12 rsockets and fork Sridhar Samudrala
[not found] ` <1344899557.2101.29.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-13 23:39 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A7DB36-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-14 17:51 ` Sridhar Samudrala
[not found] ` <502A9039.4090505-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-08-14 17:56 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89125-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-14 18:18 ` Sridhar Samudrala
[not found] ` <1344968289.2101.36.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-16 23:40 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89AEB-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-17 21:40 ` Sridhar Samudrala
[not found] ` <1345239625.1128.20.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-17 22:17 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89F70-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-17 22:36 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A89F90-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-18 0:34 ` Sridhar Samudrala
2012-08-22 23:35 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A8A7A8-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-08-23 0:30 ` Sridhar Samudrala
[not found] ` <1345681858.25565.12.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-08-23 1:32 ` Hefty, Sean
2012-08-24 18:22 ` Roland Dreier
[not found] ` <CAL1RGDW+QSXnfa5yvEsb7XqXMVbVVU6sz7BEKRPmc0tSYi8m9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-08-24 18:35 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346A8ADD4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-11-29 16:09 ` Or Gerlitz
[not found] ` <CAJZOPZJWzKZ3fjUsbBVA8-Dv+YTajJBStP2VzBw=VrfXHSi=xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-29 16:27 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237346ADAC17-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-12-02 12:20 ` Or Gerlitz
2012-12-02 12:20 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox