All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Lau <kafai@fb.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: <linux-kernel@vger.kernel.org>, <clm@fb.com>
Subject: Re: [PATCH] ring-buffer: Fix polling on trace_pipe
Date: Thu, 26 Jun 2014 11:34:46 -0700	[thread overview]
Message-ID: <20140626183445.GA2663@devbig242.prn2.facebook.com> (raw)
In-Reply-To: <20140611055814.GA30265@devbig242.prn2.facebook.com>

Hi Steve,

Can the modified test program reproduce the problem in your test setup?

Thanks,
--Martin

On Tue, Jun 10, 2014 at 10:58:14PM -0700, Martin Lau wrote:
> Hi Steve,
> 
> Attached is the modified test program.  Here is the sample output:
> 
> localhost ~ # ./ftrace-test-epoll-kafai
>            <...>-1857  [000] ...1   720.174295: tracing_mark_write: some data
> 1857: waitting for more data......
> 1858: written more data
> <block here>
> 
> Thanks,
> --Martin
> 
> On Tue, Jun 10, 2014 at 11:49:15AM -0400, Steven Rostedt wrote:
> > On Mon, 9 Jun 2014 23:06:42 -0700
> > Martin Lau <kafai@fb.com> wrote:
> > 
> > > ring_buffer_poll_wait() should always put the poll_table to its wait_queue
> > > even there is immediate data available.  Otherwise, the following epoll and
> > > read sequence will eventually hang forever:
> > > 
> > > 1. Put some data to make the trace_pipe ring_buffer read ready first
> > > 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
> > > 3. epoll_wait()
> > > 4. read(trace_pipe_fd) till EAGAIN
> > > 5. Add some more data to the trace_pipe ring_buffer
> > > 6. epoll_wait() -> this epoll_wait() will block forever
> > > 
> > > ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
> > >   ring_buffer_poll_wait() returns immediately without adding poll_table,
> > >   which has poll_table->_qproc pointing to ep_poll_callback(), to its
> > >   wait_queue.
> > > ~ During the epoll_wait() call in step 3 and step 6,
> > >   ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
> > >   because the poll_table->_qproc is NULL and it is how epoll works.
> > > ~ When there is new data available in step 6, ring_buffer does not know
> > >   it has to call ep_poll_callback() because it is not in its wait queue.
> > >   Hence, block forever.
> > > 
> > > Other poll implementation seems to call poll_wait() unconditionally as the very
> > > first thing to do.  For example, tcp_poll() in tcp.c.
> > 
> > I'm trying to see the effect of this bug, but can't seem to reproduce
> > it. Maybe I did something wrong. Attached is a test program I wrote
> > trying to follow your instructions. I don't use epoll, so perhaps I
> > didn't use it correctly.
> > 
> > Can you modify it to show me the problem this is trying to fix. That
> > is, without this patch it hangs, but with the patch it does not.
> > 
> > Thanks!
> > 
> > -- Steve
> > 
> > > 
> > > Signed-off-by: Martin Lau <kafai@fb.com>
> > > ---
> > >  kernel/trace/ring_buffer.c | 4 ----
> > >  1 file changed, 4 deletions(-)
> > > 
> > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > index fd12cc5..a6e64e8 100644
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c
> > > @@ -613,10 +613,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu,
> > >  	struct ring_buffer_per_cpu *cpu_buffer;
> > >  	struct rb_irq_work *work;
> > >  
> > > -	if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
> > > -	    (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
> > > -		return POLLIN | POLLRDNORM;
> > > -
> > >  	if (cpu == RING_BUFFER_ALL_CPUS)
> > >  		work = &buffer->irq_work;
> > >  	else {
> > 
> 

> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/epoll.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <assert.h>
> 
> static const char * debugfs_list[] = {
> 	"/debug/tracing",
> 	"/sys/kernel/debug/tracing",
> 	"/d/tracing",
> 	NULL,
> };
> 
> static const char *debugfs;
> static int markfd;
> static int trace_pipe_fd;
> 
> static const char *find_debugfs(void)
> {
> 	struct stat st;
> 	int i;
> 	int r;
> 
> 	for (i = 0; debugfs_list[i]; i++) {
> 		r = stat(debugfs_list[i], &st);
> 		if (r < 0)
> 			continue;
> 		if (S_ISDIR(st.st_mode))
> 			return debugfs_list[i];
> 	}
> 	return NULL;
> }
> 
> static char * make_path(const char *file)
> {
> 	char *path;
> 	int size;
> 
> 	size = strlen(debugfs) + strlen(file) + 2;
> 	path = malloc(size);
> 	if (!path) {
> 		perror("malloc");
> 		exit(-1);
> 	}
> 	sprintf(path, "%s/%s", debugfs, file);
> 	return path;
> }
> 
> static void mark_write(const char *str)
> {
> 	write(markfd, str, strlen(str));
> }
> 
> static void read_trace_pipe(void)
> {
> 	char buf[1024];
> 	int r;
> 
> 	while ((r = read(trace_pipe_fd, buf, 1024)) > 0)
> 		printf("%.*s", r, buf);
> }
> 
> int main (int argc, char **argv)
> {
> 	struct epoll_event ee;
> 	char *marker;
> 	char *pipe;
> 	int efd;
> 	int ret;
> 	pid_t dwrt_pid;
> 
> 	debugfs = find_debugfs();
> 	if (!debugfs) {
> 		fprintf(stderr, "Could not find debugfs\n");
> 		exit(-1);
> 	}
> 
> 	marker = make_path("trace_marker");
> 	pipe = make_path("trace_pipe");
> 
> 	markfd = open(marker, O_WRONLY);
> 	if (markfd < 0) {
> 		perror("marker");
> 		exit(-1);
> 	}
> 	trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK);
> 	if (trace_pipe_fd < 0) {
> 		perror("trace_pipe");
> 		exit(-1);
> 	}
> 
> 	efd = epoll_create(1);
> 	if (efd < 0) {
> 		perror("epoll_create");
> 		exit(-1);
> 	}
> 
> 	mark_write("some data");
> 	ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee);
> 	if (ret < 0) {
> 		perror("epoll_ctl");
> 		exit(-1);
> 	}
> 	epoll_wait(efd, &ee, 1, -1);
> 	read_trace_pipe();
> 	dwrt_pid = fork();
> 	assert(dwrt_pid != -1);
> 	if (dwrt_pid == 0) {
> 		sleep(10);
> 		mark_write("more data");
> 		printf("%d: written more data\n", getpid());
> 	} else {
> 		printf("%d: waitting for more data......\n", getpid());
> 		epoll_wait(efd, &ee, 1, -1);
> 		printf("%d: got more data\n", getpid());
> 		read_trace_pipe();
> 	}
> 	return 0;
> }


  reply	other threads:[~2014-06-26 18:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-10  6:06 [PATCH] ring-buffer: Fix polling on trace_pipe Martin Lau
2014-06-10 15:49 ` Steven Rostedt
2014-06-11  5:58   ` Martin Lau
2014-06-26 18:34     ` Martin Lau [this message]
2014-06-27  0:53       ` Steven Rostedt
2014-06-27  6:52         ` Martin Lau
2014-07-10 22:20           ` Martin Lau
2014-07-15 19:46             ` Steven Rostedt
2014-07-15 20:20               ` Martin Lau
2014-07-15 17:32 ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140626183445.GA2663@devbig242.prn2.facebook.com \
    --to=kafai@fb.com \
    --cc=clm@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.