public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
@ 2017-12-14 11:02 Honggang LI
       [not found] ` <20171214110241.4701-1-honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Honggang LI @ 2017-12-14 11:02 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: honli-H+wXaHxf7aLQT0dZR+AlfA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/

From: Honggang Li <honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

[root@localhost ~]# /usr/sbin/ibsrpdm -c -d /dev/infiniband/umad0
id_ext=0002c90300317810,ioc_guid=0002c90300317810,dgid=fe800000000000000002c90300317811,pkey=ffff,service_id=0002c90300317810

[root@localhost ~]# echo $?
130

fd3005f0cd34 moves the signal handler setup from ibsrpdm path. So,
default signal handler will be used when it receives signal SIGINT.
ibsrpdm will exit with non-zero exit code as default signal handler
killed it. ibsrpdm should return with exit code zero, if no error
emerged.

main->ibsrpdm->free_res->pthread_kill(res->async_ev_thread, SIGINT)

Beside the wrong exit code, there is an extra blank line at the end of
output of ibsrpdm.

Fixes: fd3005f0cd34 ("srp_daemon: Move the setup of the wakeup_pipe after openlog")
Signed-off-by: Honggang Li <honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 srp_daemon/srp_daemon.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index cec36db2..0d00d712 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1996,7 +1996,6 @@ static void cleanup_wakeup_fd(void)
 
 static int setup_wakeup_fd(void)
 {
-	struct sigaction sa = {};
 	int ret;
 
 	ret = pipe2(wakeup_pipe, O_NONBLOCK | O_CLOEXEC);
@@ -2005,11 +2004,6 @@ static int setup_wakeup_fd(void)
 		return -1;
 	}
 
-	sigemptyset(&sa.sa_mask);
-	sa.sa_handler = signal_handler;
-	sigaction(SIGINT, &sa, NULL);
-	sigaction(SIGTERM, &sa, NULL);
-	sigaction(SRP_CATAS_ERR, &sa, NULL);
 	return 0;
 }
 
@@ -2100,6 +2094,7 @@ int main(int argc, char *argv[])
 	int			lockfd = -1;
 	int			received_signal = 0;
 	bool                    systemd;
+	struct sigaction        sa = {};
 
 #ifndef __CHECKER__
 	/*
@@ -2117,6 +2112,12 @@ int main(int argc, char *argv[])
 	BUILD_ASSERT(sizeof(struct srp_dm_iou_info) == 132);
 	BUILD_ASSERT(sizeof(struct srp_dm_ioc_prof) == 128);
 
+	sigemptyset(&sa.sa_mask);
+	sa.sa_handler = signal_handler;
+	sigaction(SIGINT, &sa, NULL);
+	sigaction(SIGTERM, &sa, NULL);
+	sigaction(SRP_CATAS_ERR, &sa, NULL);
+
 	if (strcmp(argv[0] + max_t(int, 0, strlen(argv[0]) - strlen("ibsrpdm")),
 		   "ibsrpdm") == 0) {
 		ret = ibsrpdm(argc, argv);
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found] ` <20171214110241.4701-1-honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-12-14 14:59   ` Bart Van Assche
       [not found]     ` <1513263572.2986.2.camel-Sjgp3cTcYWE@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-12-14 14:59 UTC (permalink / raw)
  To: honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org

On Thu, 2017-12-14 at 19:02 +0800, Honggang LI wrote:
> fd3005f0cd34 moves the signal handler setup from ibsrpdm path. So,
> default signal handler will be used when it receives signal SIGINT.
> ibsrpdm will exit with non-zero exit code as default signal handler
> killed it.

Can you explain why you think that ibsrpdm needs a signal handler?

> ibsrpdm should return with exit code zero, if no error emerged.

That's correct, but if it was interrupted by a signal handler an error
code should be returned.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]     ` <1513263572.2986.2.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2017-12-15  1:36       ` Honggang LI
       [not found]         ` <20171215013628.GA743-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Honggang LI @ 2017-12-15  1:36 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org

On Thu, Dec 14, 2017 at 02:59:34PM +0000, Bart Van Assche wrote:
> On Thu, 2017-12-14 at 19:02 +0800, Honggang LI wrote:
> > fd3005f0cd34 moves the signal handler setup from ibsrpdm path. So,
> > default signal handler will be used when it receives signal SIGINT.
> > ibsrpdm will exit with non-zero exit code as default signal handler
> > killed it.
> 
> Can you explain why you think that ibsrpdm needs a signal handler?

main
  ibsrpdm
    alloc_res
      pthread_create(&res->res.async_ev_thread [1]
    ....
    free_res
      if (res->async_ev_thread) { pthread_kill(res->async_ev_thread, SIGINT); [2]


The ibsrpdm program create a thread in [1], and send a SIGINT in [2].
The default behavior of SIGINT is to terminate the process.

According to the man page of pthread_kill:
 Signal dispositions are process-wide: if a signal handler is installed,
 the handler will be invoked in the thread _thread_, but if the disposition 
 of the signal is "stop", "continue", or "terminate", this action will 
 affect the whole process.

If we did not install a signal handler for ibsrpdm, ibsrpdm will be
terminated in [2], when it free the resource. ibsrpdm expected to return
to the main program after it free the resource. However, it failed as it
killed by default signal handler.

> 
> > ibsrpdm should return with exit code zero, if no error emerged.
> 
> That's correct, but if it was interrupted by a signal handler an error
> code should be returned.
> 
> Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]         ` <20171215013628.GA743-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
@ 2017-12-15 17:28           ` Jason Gunthorpe
       [not found]             ` <20171215172800.GA12434-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2017-12-15 17:28 UTC (permalink / raw)
  To: Honggang LI
  Cc: Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, Dec 15, 2017 at 09:36:28AM +0800, Honggang LI wrote:
> On Thu, Dec 14, 2017 at 02:59:34PM +0000, Bart Van Assche wrote:
> > On Thu, 2017-12-14 at 19:02 +0800, Honggang LI wrote:
> > > fd3005f0cd34 moves the signal handler setup from ibsrpdm path. So,
> > > default signal handler will be used when it receives signal SIGINT.
> > > ibsrpdm will exit with non-zero exit code as default signal handler
> > > killed it.
> > 
> > Can you explain why you think that ibsrpdm needs a signal handler?
> 
> main
>   ibsrpdm
>     alloc_res
>       pthread_create(&res->res.async_ev_thread [1]
>     ....
>     free_res
>       if (res->async_ev_thread) { pthread_kill(res->async_ev_thread, SIGINT); [2]
> 
> 
> The ibsrpdm program create a thread in [1], and send a SIGINT in [2].
> The default behavior of SIGINT is to terminate the process.

Yuk, no, using signals like this is horrifyingly buggy.

Here is a sketch on how to fix it properly. All the users of
pthread_kill should be eliminated.

Though overall, there is really no reason to even cleanup the threads,
just call exit?

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index cec36db2e0f12e..fe307705ad6a51 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -61,6 +61,7 @@
 #include <string.h>
 #include <signal.h>
 #include <sys/syslog.h>
+#include <sys/eventfd.h>
 #include <infiniband/umad.h>
 #include <infiniband/umad_types.h>
 #include <infiniband/umad_sa.h>
@@ -1887,7 +1888,9 @@ static void free_res(struct resources *res)
 		modify_qp_to_err(res->ud_res->qp);
 
 	if (res->reconnect_thread) {
-		pthread_kill(res->reconnect_thread, SIGINT);
+		uint64_t val = 1;
+
+		write(res->sync_res->stop_event_fd, &val, sizeof(val));
 		pthread_join(res->reconnect_thread, &status);
 	}
 	if (res->async_ev_thread) {
@@ -1947,6 +1950,7 @@ static struct resources *alloc_res(void)
 			goto err;
 	}
 
+	res->sync_res.stop_event_fd = eventfd(0, EFD_CLOEXEC);
 	ret = pthread_create(&res->res.async_ev_thread, NULL,
 			     run_thread_listen_to_events, &res->res);
 	if (ret)
diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h
index 5d268ed395e17e..8cbee09d7c688b 100644
--- a/srp_daemon/srp_daemon.h
+++ b/srp_daemon/srp_daemon.h
@@ -245,6 +245,7 @@ enum {
 };
 
 struct sync_resources {
+	int stop_event_fd;
 	int stop_threads;
 	int next_task;
 	struct timespec next_recalc_time;
diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c
index 6b36b15cc84c16..cedc2b0ab8e5af 100644
--- a/srp_daemon/srp_handle_traps.c
+++ b/srp_daemon/srp_handle_traps.c
@@ -44,6 +44,7 @@
 #include <infiniband/verbs.h>
 #include <infiniband/umad_sa.h>
 #include <infiniband/umad_sm.h>
+#include <poll.h>
 
 #include "srp_ib_types.h"
 
@@ -788,69 +789,94 @@ int register_to_traps(struct resources *res, int subscribe)
 
 }
 
-void *run_thread_listen_to_events(void *res_in)
+static int do_async_event(struct resources *res)
 {
-	struct resources *res = (struct resources *)res_in;
 	struct ibv_async_event event;
 
-	while (!stop_threads(res->sync_res)) {
-		if (ibv_get_async_event(res->ud_res->ib_ctx, &event)) {
-			if (errno != EINTR)
-				pr_err("ibv_get_async_event failed (errno = %d)\n",
-				       errno);
-			break;
+	if (ibv_get_async_event(res->ud_res->ib_ctx, &event)) {
+		if (errno != EINTR)
+			pr_err("ibv_get_async_event failed (errno = %d)\n",
+			       errno);
+		return -1;
+	}
+
+	pr_debug("event_type %d, port %d\n", event.event_type,
+		 event.element.port_num);
+
+	switch (event.event_type) {
+	case IBV_EVENT_PORT_ACTIVE:
+	case IBV_EVENT_SM_CHANGE:
+	case IBV_EVENT_LID_CHANGE:
+	case IBV_EVENT_CLIENT_REREGISTER:
+	case IBV_EVENT_PKEY_CHANGE:
+		if (event.element.port_num == config->port_num) {
+			pthread_mutex_lock(&res->sync_res->mutex);
+			__schedule_rescan(res->sync_res, 0);
+			wake_up_main_loop(0);
+			pthread_mutex_unlock(&res->sync_res->mutex);
 		}
+		break;
 
-		pr_debug("event_type %d, port %d\n",
-			 event.event_type, event.element.port_num);
-
-		switch (event.event_type) {
-		case IBV_EVENT_PORT_ACTIVE:
-		case IBV_EVENT_SM_CHANGE:
-		case IBV_EVENT_LID_CHANGE:
-		case IBV_EVENT_CLIENT_REREGISTER:
-		case IBV_EVENT_PKEY_CHANGE:
-			if (event.element.port_num == config->port_num) {
-				pthread_mutex_lock(&res->sync_res->mutex);
-				__schedule_rescan(res->sync_res, 0);
-				wake_up_main_loop(0);
-				pthread_mutex_unlock(&res->sync_res->mutex);
-			}
-		  	break;
-
-		case IBV_EVENT_DEVICE_FATAL:
-		case IBV_EVENT_CQ_ERR:
-		case IBV_EVENT_QP_FATAL:
-		  /* clean and restart */
-			pr_err("Critical event %d, raising catastrophic "
-			       "error signal\n", event.event_type);
-			raise(SRP_CATAS_ERR);
-			break;
+	case IBV_EVENT_DEVICE_FATAL:
+	case IBV_EVENT_CQ_ERR:
+	case IBV_EVENT_QP_FATAL:
+		/* clean and restart */
+		pr_err("Critical event %d, raising catastrophic "
+		       "error signal\n",
+		       event.event_type);
+		raise(SRP_CATAS_ERR);
+		break;
 
- 	      	 /*
+		/*
 
-		case IBV_EVENT_PORT_ERR:
-		case IBV_EVENT_QP_REQ_ERR:
-		case IBV_EVENT_QP_ACCESS_ERR:
-		case IBV_EVENT_COMM_EST:
-		case IBV_EVENT_SQ_DRAINED:
-		case IBV_EVENT_PATH_MIG:
-		case IBV_EVENT_PATH_MIG_ERR:
-		case IBV_EVENT_SRQ_ERR:
-		case IBV_EVENT_SRQ_LIMIT_REACHED:
-		case IBV_EVENT_QP_LAST_WQE_REACHED:
+	       case IBV_EVENT_PORT_ERR:
+	       case IBV_EVENT_QP_REQ_ERR:
+	       case IBV_EVENT_QP_ACCESS_ERR:
+	       case IBV_EVENT_COMM_EST:
+	       case IBV_EVENT_SQ_DRAINED:
+	       case IBV_EVENT_PATH_MIG:
+	       case IBV_EVENT_PATH_MIG_ERR:
+	       case IBV_EVENT_SRQ_ERR:
+	       case IBV_EVENT_SRQ_LIMIT_REACHED:
+	       case IBV_EVENT_QP_LAST_WQE_REACHED:
 
-		*/
+	       */
 
 
-		default:
+	default:
+		break;
+	}
+
+	ibv_ack_async_event(&event);
+	return 0;
+}
+
+void *run_thread_listen_to_events(void *res_in)
+{
+	struct pollfd fds[2];
+	struct resources *res = (struct resources *)res_in;
+
+	fds[0].fd = res->ud_res->ib_ctx->async_fd;
+	fds[0].events = POLLIN;
+	fds[1].fd = res->sync_res->stop_event_fd;
+	fds[1].events = POLLIN;
+
+	while (1) {
+		if (poll(fds, 2, -1) == -1) {
+			if (errno == EINTR)
+				continue;
+			pr_err("ibv_get_async_event failed (errno = %d)\n",
+			       errno);
 			break;
 		}
 
-		ibv_ack_async_event(&event);
+		if (fds[1].revents & POLLIN)
+			break;
 
+		if (fds[0].revents & POLLIN)
+			if (do_async_event(res))
+				break;
 	}
 
 	return NULL;
 }
-
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]             ` <20171215172800.GA12434-uk2M96/98Pc@public.gmane.org>
@ 2017-12-15 17:50               ` Bart Van Assche
       [not found]                 ` <1513360253.2571.23.camel-Sjgp3cTcYWE@public.gmane.org>
  2017-12-19 12:20               ` Honggang LI
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-12-15 17:50 UTC (permalink / raw)
  To: honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	jgg-uk2M96/98Pc@public.gmane.org
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1045 bytes --]

On Fri, 2017-12-15 at 10:28 -0700, Jason Gunthorpe wrote:
> Though overall, there is really no reason to even cleanup the threads,
> just call exit?

Memory leak detectors like the one in Valgrind only produce meaningful results
if threads are stopped cleanly before exit() is called.

> @@ -1887,7 +1888,9 @@ static void free_res(struct resources *res)
>  		modify_qp_to_err(res->ud_res->qp);
>  
>  	if (res->reconnect_thread) {
> -		pthread_kill(res->reconnect_thread, SIGINT);
> +		uint64_t val = 1;
> +
> +		write(res->sync_res->stop_event_fd, &val, sizeof(val));
>  		pthread_join(res->reconnect_thread, &status);

This patch does not remove all pthread_kill() calls. There are two other such
calls:

$ git grep -nHw pthread_kill
srp_daemon/srp_daemon.c:1897:		pthread_kill(res->async_ev_thread, SIGINT);
srp_daemon/srp_daemon.c:1901:		pthread_kill(res->trap_thread, SIGINT);

Bart.N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]                 ` <1513360253.2571.23.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2017-12-15 17:59                   ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2017-12-15 17:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, Dec 15, 2017 at 05:50:54PM +0000, Bart Van Assche wrote:
> On Fri, 2017-12-15 at 10:28 -0700, Jason Gunthorpe wrote:
> > Though overall, there is really no reason to even cleanup the threads,
> > just call exit?
> 
> Memory leak detectors like the one in Valgrind only produce meaningful results
> if threads are stopped cleanly before exit() is called.

Fair enough

> > @@ -1887,7 +1888,9 @@ static void free_res(struct resources *res)
> >  		modify_qp_to_err(res->ud_res->qp);
> >  
> >  	if (res->reconnect_thread) {
> > -		pthread_kill(res->reconnect_thread, SIGINT);
> > +		uint64_t val = 1;
> > +
> > +		write(res->sync_res->stop_event_fd, &val, sizeof(val));
> >  		pthread_join(res->reconnect_thread, &status);
> 
> This patch does not remove all pthread_kill() calls. There are two other such
> calls:
> 
> $ git grep -nHw pthread_kill
> srp_daemon/srp_daemon.c:1897:		pthread_kill(res->async_ev_thread, SIGINT);
> srp_daemon/srp_daemon.c:1901:		pthread_kill(res->trap_thread, SIGINT);

Correct, this is why I called it a sketch :) I'm sure there are a few
mistakes too..

Hopefully Honggang can finish it and test it, as he found a real bug.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]             ` <20171215172800.GA12434-uk2M96/98Pc@public.gmane.org>
  2017-12-15 17:50               ` Bart Van Assche
@ 2017-12-19 12:20               ` Honggang LI
       [not found]                 ` <20171219122050.GA19682-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Honggang LI @ 2017-12-19 12:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, Dec 15, 2017 at 10:28:00AM -0700, Jason Gunthorpe wrote:
> On Fri, Dec 15, 2017 at 09:36:28AM +0800, Honggang LI wrote:
> > On Thu, Dec 14, 2017 at 02:59:34PM +0000, Bart Van Assche wrote:
> > > On Thu, 2017-12-14 at 19:02 +0800, Honggang LI wrote:
> > > > fd3005f0cd34 moves the signal handler setup from ibsrpdm path. So,
> > > > default signal handler will be used when it receives signal SIGINT.
> > > > ibsrpdm will exit with non-zero exit code as default signal handler
> > > > killed it.
> > > 
> > > Can you explain why you think that ibsrpdm needs a signal handler?
> > 
> > main
> >   ibsrpdm
> >     alloc_res
> >       pthread_create(&res->res.async_ev_thread [1]
> >     ....
> >     free_res
> >       if (res->async_ev_thread) { pthread_kill(res->async_ev_thread, SIGINT); [2]
> > 
> > 
> > The ibsrpdm program create a thread in [1], and send a SIGINT in [2].
> > The default behavior of SIGINT is to terminate the process.
> 
> Yuk, no, using signals like this is horrifyingly buggy.

OK, I agree with you. The "signal_handler", which calls "wake_up_main_loop",
is not the right solution, because wakeup_pipe never used by ibsrpdm.

<snip>
void wake_up_main_loop(char ch)
{
        int res;

	assert(wakeup_pipe[1] >= 0); // this 'assert' will be failed
<snip>	


> Here is a sketch on how to fix it properly. All the users of
> pthread_kill should be eliminated.
> 
> Though overall, there is really no reason to even cleanup the threads,
> just call exit?

No, if one pthread just calls 'exit', the entire process will be
terminated immediately. So, we need to cleanup the threads.

I think the source of current issue is the async_ev_thread pthread.
We should *NOT* create such pthread for ibsrpdm.

I checked the old srptools git repo.

git://git.openfabrics.org/~bvanassche/srptools.git

Commit ab57a5b92eb3b8c9221f77235a028814a462d2cb merges "ibsrpdm" into
"srp_daemon". The old ibsrpdm program is a single thread program.
srp_daemon is multi-thread program.

As ibsrpdm always only run once, because "config->once" has been
assigned to 1. I would suggest to apply this patch.

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index cec36db2..4012a7db 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -1945,12 +1945,12 @@ static struct resources *alloc_res(void)
 				     run_thread_get_trap_notices, &res->res);
 		if (ret)
 			goto err;
-	}
 
-	ret = pthread_create(&res->res.async_ev_thread, NULL,
-			     run_thread_listen_to_events, &res->res);
-	if (ret)
-		goto err;
+		ret = pthread_create(&res->res.async_ev_thread, NULL,
+				     run_thread_listen_to_events, &res->res);
+		if (ret)
+			goto err;
+	}
 
 	if (config->retry_timeout && !config->once) {
 		ret = pthread_create(&res->res.reconnect_thread, NULL,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]                 ` <20171219122050.GA19682-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
@ 2017-12-19 16:57                   ` Bart Van Assche
       [not found]                     ` <1513702653.2535.3.camel-Sjgp3cTcYWE@public.gmane.org>
  2017-12-19 21:13                   ` Jason Gunthorpe
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-12-19 16:57 UTC (permalink / raw)
  To: honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	jgg-uk2M96/98Pc@public.gmane.org
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Tue, 2017-12-19 at 20:20 +0800, Honggang LI wrote:
> As ibsrpdm always only run once, because "config->once" has been
> assigned to 1. I would suggest to apply this patch.
> 
> diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
> index cec36db2..4012a7db 100644
> --- a/srp_daemon/srp_daemon.c
> +++ b/srp_daemon/srp_daemon.c
> @@ -1945,12 +1945,12 @@ static struct resources *alloc_res(void)
>  				     run_thread_get_trap_notices, &res->res);
>  		if (ret)
>  			goto err;
> -	}
>  
> -	ret = pthread_create(&res->res.async_ev_thread, NULL,
> -			     run_thread_listen_to_events, &res->res);
> -	if (ret)
> -		goto err;
> +		ret = pthread_create(&res->res.async_ev_thread, NULL,
> +				     run_thread_listen_to_events, &res->res);
> +		if (ret)
> +			goto err;
> +	}
>  
>  	if (config->retry_timeout && !config->once) {
>  		ret = pthread_create(&res->res.reconnect_thread, NULL,

Please submit this as a proper patch and add the following:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]                     ` <1513702653.2535.3.camel-Sjgp3cTcYWE@public.gmane.org>
@ 2017-12-19 19:12                       ` Honggang LI
  0 siblings, 0 replies; 10+ messages in thread
From: Honggang LI @ 2017-12-19 19:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jgg-uk2M96/98Pc@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Tue, Dec 19, 2017 at 04:57:35PM +0000, Bart Van Assche wrote:
> On Tue, 2017-12-19 at 20:20 +0800, Honggang LI wrote:
> > As ibsrpdm always only run once, because "config->once" has been
> > assigned to 1. I would suggest to apply this patch.
> > 
> > diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
> > index cec36db2..4012a7db 100644
> > --- a/srp_daemon/srp_daemon.c
> > +++ b/srp_daemon/srp_daemon.c
> > @@ -1945,12 +1945,12 @@ static struct resources *alloc_res(void)
> >  				     run_thread_get_trap_notices, &res->res);
> >  		if (ret)
> >  			goto err;
> > -	}
> >  
> > -	ret = pthread_create(&res->res.async_ev_thread, NULL,
> > -			     run_thread_listen_to_events, &res->res);
> > -	if (ret)
> > -		goto err;
> > +		ret = pthread_create(&res->res.async_ev_thread, NULL,
> > +				     run_thread_listen_to_events, &res->res);
> > +		if (ret)
> > +			goto err;
> > +	}
> >  
> >  	if (config->retry_timeout && !config->once) {
> >  		ret = pthread_create(&res->res.reconnect_thread, NULL,
> 
> Please submit this as a proper patch and add the following:

[rdma-core PATCH] srp_daemon: Don't create async_ev_thread if only run once

I sent the patch with that $SUBJECT.

thanks
> Reviewed-by: Bart Van Assche <bart.vanassche-Sjgp3cTcYWE@public.gmane.org>
> 
> Thanks,
> 
> Bart.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm
       [not found]                 ` <20171219122050.GA19682-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
  2017-12-19 16:57                   ` Bart Van Assche
@ 2017-12-19 21:13                   ` Jason Gunthorpe
  1 sibling, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2017-12-19 21:13 UTC (permalink / raw)
  To: Honggang LI
  Cc: Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Tue, Dec 19, 2017 at 08:20:50PM +0800, Honggang LI wrote:

> > Here is a sketch on how to fix it properly. All the users of
> > pthread_kill should be eliminated.
> > 
> > Though overall, there is really no reason to even cleanup the threads,
> > just call exit?
> 
> No, if one pthread just calls 'exit', the entire process will be
> terminated immediately. So, we need to cleanup the threads.
> 
> I think the source of current issue is the async_ev_thread pthread.
> We should *NOT* create such pthread for ibsrpdm.
> 
> I checked the old srptools git repo.

FYI, this is merged into rdma-core, use
  git log -p --follow srp_daemon/srp_daemon.c

and it will show you full history on a single file.

> Commit ab57a5b92eb3b8c9221f77235a028814a462d2cb merges "ibsrpdm" into
> "srp_daemon". The old ibsrpdm program is a single thread program.
> srp_daemon is multi-thread program.

Makes sense that ibsrpdm does not need the
run_thread_listen_to_events() thread. Patch looks OK to me too.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-12-19 21:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-14 11:02 [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm Honggang LI
     [not found] ` <20171214110241.4701-1-honli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-12-14 14:59   ` Bart Van Assche
     [not found]     ` <1513263572.2986.2.camel-Sjgp3cTcYWE@public.gmane.org>
2017-12-15  1:36       ` Honggang LI
     [not found]         ` <20171215013628.GA743-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
2017-12-15 17:28           ` Jason Gunthorpe
     [not found]             ` <20171215172800.GA12434-uk2M96/98Pc@public.gmane.org>
2017-12-15 17:50               ` Bart Van Assche
     [not found]                 ` <1513360253.2571.23.camel-Sjgp3cTcYWE@public.gmane.org>
2017-12-15 17:59                   ` Jason Gunthorpe
2017-12-19 12:20               ` Honggang LI
     [not found]                 ` <20171219122050.GA19682-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
2017-12-19 16:57                   ` Bart Van Assche
     [not found]                     ` <1513702653.2535.3.camel-Sjgp3cTcYWE@public.gmane.org>
2017-12-19 19:12                       ` Honggang LI
2017-12-19 21:13                   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox