linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] relay: fix timer madness
@ 2013-04-20  7:37 zhangwei(Jovi)
  2013-04-23 21:28 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: zhangwei(Jovi) @ 2013-04-20  7:37 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, linux-kernel@vger.kernel.org,
	linux-rt-users

Hi,

Ingo, Steven, I get this patch from 3.4 preempt-rt patch set, It seems that this patch
fix relayfs bug not only for rt kernel, but also for mainline.

When I'm using below ktap script to tracing all event tracepoints, without this patch,
the system will hang in few seconds, the patch indeed fix the problem as the changelog pointed.

	function eventfun (e) {
        	printf("%d %d\t%s\t%s", cpu(), pid(), execname(), e.annotate)
	}

	kdebug.probe("tp:", eventfun)

	kdebug.probe_end(function () {
        	printf("probe end\n")
	})


This patch is old, I can found the original patch discussion in 2007.
	http://marc.info/?l=linux-kernel&m=118544794717162&w=2
(In that mail thread, the patch didn't fix that problem, but it fix the problem I encountered now)

I hope you can remember this :)

so why we didn't commit this patch into mainline? any concern?

Thanks.

------------------------------------->
Subject: relay: fix timer madness
From: Ingo Molnar <mingo@elte.hu>

remove timer calls (!!!) from deep within the tracing infrastructure.
This was totally bogus code that can cause lockups and worse.
Poll the buffer every 2 jiffies for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/relay.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

Index: linux-rt-rebase.q/kernel/relay.c
===================================================================
--- linux-rt-rebase.q.orig/kernel/relay.c
+++ linux-rt-rebase.q/kernel/relay.c
@@ -319,6 +319,10 @@ static void wakeup_readers(unsigned long
 {
 	struct rchan_buf *buf = (struct rchan_buf *)data;
 	wake_up_interruptible(&buf->read_wait);
+	/*
+	 * Stupid polling for now:
+	 */
+	mod_timer(&buf->timer, jiffies + 1);
 }

 /**
@@ -336,6 +340,7 @@ static void __relay_reset(struct rchan_b
 		init_waitqueue_head(&buf->read_wait);
 		kref_init(&buf->kref);
 		setup_timer(&buf->timer, wakeup_readers, (unsigned long)buf);
+		mod_timer(&buf->timer, jiffies + 1);
 	} else
 		del_timer_sync(&buf->timer);

@@ -604,15 +609,6 @@ size_t relay_switch_subbuf(struct rchan_
 		buf->subbufs_produced++;
 		buf->dentry->d_inode->i_size += buf->chan->subbuf_size -
 			buf->padding[old_subbuf];
-		smp_mb();
-		if (waitqueue_active(&buf->read_wait))
-			/*
-			 * Calling wake_up_interruptible() from here
-			 * will deadlock if we happen to be logging
-			 * from the scheduler (trying to re-grab
-			 * rq->lock), so defer it.
-			 */
-			__mod_timer(&buf->timer, jiffies + 1);
 	}

 	old = buf->data;


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] relay: fix timer madness
  2013-04-20  7:37 [PATCH] relay: fix timer madness zhangwei(Jovi)
@ 2013-04-23 21:28 ` Andrew Morton
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2013-04-23 21:28 UTC (permalink / raw)
  To: zhangwei(Jovi)
  Cc: Steven Rostedt, Ingo Molnar, linux-kernel@vger.kernel.org,
	linux-rt-users

On Sat, 20 Apr 2013 15:37:08 +0800 "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> wrote:

> Hi,
> 
> Ingo, Steven, I get this patch from 3.4 preempt-rt patch set, It seems that this patch
> fix relayfs bug not only for rt kernel, but also for mainline.
> 
> When I'm using below ktap script to tracing all event tracepoints, without this patch,
> the system will hang in few seconds, the patch indeed fix the problem as the changelog pointed.
> 
> 	function eventfun (e) {
>         	printf("%d %d\t%s\t%s", cpu(), pid(), execname(), e.annotate)
> 	}
> 
> 	kdebug.probe("tp:", eventfun)
> 
> 	kdebug.probe_end(function () {
>         	printf("probe end\n")
> 	})
> 
> 
> This patch is old, I can found the original patch discussion in 2007.
> 	http://marc.info/?l=linux-kernel&m=118544794717162&w=2
> (In that mail thread, the patch didn't fix that problem, but it fix the problem I encountered now)
> 
> I hope you can remember this :)
> 
> so why we didn't commit this patch into mainline? any concern?
> 
> Thanks.
> 
> ------------------------------------->
> Subject: relay: fix timer madness
> From: Ingo Molnar <mingo@elte.hu>
> 
> remove timer calls (!!!) from deep within the tracing infrastructure.
> This was totally bogus code that can cause lockups and worse.
> Poll the buffer every 2 jiffies for now.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>

(This version of the patch should have your signed-off-by)

> @@ -604,15 +609,6 @@ size_t relay_switch_subbuf(struct rchan_
>  		buf->subbufs_produced++;
>  		buf->dentry->d_inode->i_size += buf->chan->subbuf_size -
>  			buf->padding[old_subbuf];
> -		smp_mb();
> -		if (waitqueue_active(&buf->read_wait))
> -			/*
> -			 * Calling wake_up_interruptible() from here
> -			 * will deadlock if we happen to be logging
> -			 * from the scheduler (trying to re-grab
> -			 * rq->lock), so defer it.
> -			 */
> -			__mod_timer(&buf->timer, jiffies + 1);
>  	}

We've "fixed" the printk-inside-runqueue-lock deadlocks via icky
hackery in wake_up_klogd().  I guess we could do it the same way here. 
But the two approaches are conceptually very similar and this version
in relay.c is much simpler.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] relay: fix timer madness
@ 2013-07-31 10:01 zhangwei(Jovi)
  2013-07-31 10:38 ` zhangwei(Jovi)
  0 siblings, 1 reply; 4+ messages in thread
From: zhangwei(Jovi) @ 2013-07-31 10:01 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: linux-kernel@vger.kernel.org, Dan Carpenter, Steven Rostedt,
	Jens Axboe, Al Viro, Eric Dumazet

From: Ingo Molnar <mingo@elte.hu>

When I'm using ktap script to tracing all event tracepoints by relay
transport, without this patch, the system will hang in few seconds.

I found the original patch discussion in 2007.
http://marc.info/?l=linux-kernel&m=118544794717162&w=2
(In that mail thread, the patch didn't fix that problem, but it fix
the problem I encountered now)

Changed from origina patch:
 - mod timer interval changed from jiffies+1 to HZ/10, as Ingo suggested.
 - mod timer interval changed from HZ/10 to jiffies + HZ/10, suggested
   by Dan Carpenter, since mod_timer() takes an offset for interval.

Original patch changelog from Ingo in 2007:

  Remove timer calls (!!!) from deep within the tracing infrastructure.
  This was totally bogus code that can cause lockups and worse.
  Poll the buffer every 2 jiffies for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <edumazet@google.com>
---
 kernel/relay.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/kernel/relay.c b/kernel/relay.c
index b91488b..42d6de3 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -339,6 +339,10 @@ static void wakeup_readers(unsigned long data)
 {
 	struct rchan_buf *buf = (struct rchan_buf *)data;
 	wake_up_interruptible(&buf->read_wait);
+	/*
+	 * Stupid polling for now:
+	 */
+	mod_timer(&buf->timer, jiffies + HZ / 10);
 }

 /**
@@ -356,6 +360,7 @@ static void __relay_reset(struct rchan_buf *buf, unsigned int init)
 		init_waitqueue_head(&buf->read_wait);
 		kref_init(&buf->kref);
 		setup_timer(&buf->timer, wakeup_readers, (unsigned long)buf);
+		mod_timer(&buf->timer, jiffies + HZ / 10);
 	} else
 		del_timer_sync(&buf->timer);

@@ -739,15 +744,6 @@ size_t relay_switch_subbuf(struct rchan_buf *buf, size_t length)
 		else
 			buf->early_bytes += buf->chan->subbuf_size -
 					    buf->padding[old_subbuf];
-		smp_mb();
-		if (waitqueue_active(&buf->read_wait))
-			/*
-			 * Calling wake_up_interruptible() from here
-			 * will deadlock if we happen to be logging
-			 * from the scheduler (trying to re-grab
-			 * rq->lock), so defer it.
-			 */
-			mod_timer(&buf->timer, jiffies + 1);
 	}

 	old = buf->data;
-- 
1.7.9.7



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] relay: fix timer madness
  2013-07-31 10:01 zhangwei(Jovi)
@ 2013-07-31 10:38 ` zhangwei(Jovi)
  0 siblings, 0 replies; 4+ messages in thread
From: zhangwei(Jovi) @ 2013-07-31 10:38 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: linux-kernel@vger.kernel.org, Dan Carpenter, Steven Rostedt,
	Jens Axboe, Al Viro, Eric Dumazet

On 2013/7/31 18:01, zhangwei(Jovi) wrote:
> From: Ingo Molnar <mingo@elte.hu>
> 
> When I'm using ktap script to tracing all event tracepoints by relay
> transport, without this patch, the system will hang in few seconds.
> 
> I found the original patch discussion in 2007.
> http://marc.info/?l=linux-kernel&m=118544794717162&w=2
> (In that mail thread, the patch didn't fix that problem, but it fix
> the problem I encountered now)
> 
> Changed from origina patch:
>  - mod timer interval changed from jiffies+1 to HZ/10, as Ingo suggested.
>  - mod timer interval changed from HZ/10 to jiffies + HZ/10, suggested
>    by Dan Carpenter, since mod_timer() takes an offset for interval.
> 
> Original patch changelog from Ingo in 2007:
> 
>   Remove timer calls (!!!) from deep within the tracing infrastructure.
>   This was totally bogus code that can cause lockups and worse.
>   Poll the buffer every 2 jiffies for now.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Eric Dumazet <edumazet@google.com>
> ---
Hi Andrew,

How about this patch? this version folded the suggestion from Ingo and Dan.

jovi


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-07-31 10:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-20  7:37 [PATCH] relay: fix timer madness zhangwei(Jovi)
2013-04-23 21:28 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2013-07-31 10:01 zhangwei(Jovi)
2013-07-31 10:38 ` zhangwei(Jovi)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).