* [PATCH] ring-buffer: Fix polling on trace_pipe @ 2014-06-10 6:06 Martin Lau 2014-06-10 15:49 ` Steven Rostedt 2014-07-15 17:32 ` Chris Mason 0 siblings, 2 replies; 10+ messages in thread From: Martin Lau @ 2014-06-10 6:06 UTC (permalink / raw) To: rostedt; +Cc: linux-kernel, clm ring_buffer_poll_wait() should always put the poll_table to its wait_queue even there is immediate data available. Otherwise, the following epoll and read sequence will eventually hang forever: 1. Put some data to make the trace_pipe ring_buffer read ready first 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) 3. epoll_wait() 4. read(trace_pipe_fd) till EAGAIN 5. Add some more data to the trace_pipe ring_buffer 6. epoll_wait() -> this epoll_wait() will block forever ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, ring_buffer_poll_wait() returns immediately without adding poll_table, which has poll_table->_qproc pointing to ep_poll_callback(), to its wait_queue. ~ During the epoll_wait() call in step 3 and step 6, ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue because the poll_table->_qproc is NULL and it is how epoll works. ~ When there is new data available in step 6, ring_buffer does not know it has to call ep_poll_callback() because it is not in its wait queue. Hence, block forever. Other poll implementation seems to call poll_wait() unconditionally as the very first thing to do. For example, tcp_poll() in tcp.c. Signed-off-by: Martin Lau <kafai@fb.com> --- kernel/trace/ring_buffer.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index fd12cc5..a6e64e8 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -613,10 +613,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu, struct ring_buffer_per_cpu *cpu_buffer; struct rb_irq_work *work; - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || - (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) - return POLLIN | POLLRDNORM; - if (cpu == RING_BUFFER_ALL_CPUS) work = &buffer->irq_work; else { -- 1.8.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-06-10 6:06 [PATCH] ring-buffer: Fix polling on trace_pipe Martin Lau @ 2014-06-10 15:49 ` Steven Rostedt 2014-06-11 5:58 ` Martin Lau 2014-07-15 17:32 ` Chris Mason 1 sibling, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2014-06-10 15:49 UTC (permalink / raw) To: Martin Lau; +Cc: linux-kernel, clm [-- Attachment #1: Type: text/plain, Size: 2430 bytes --] On Mon, 9 Jun 2014 23:06:42 -0700 Martin Lau <kafai@fb.com> wrote: > ring_buffer_poll_wait() should always put the poll_table to its wait_queue > even there is immediate data available. Otherwise, the following epoll and > read sequence will eventually hang forever: > > 1. Put some data to make the trace_pipe ring_buffer read ready first > 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) > 3. epoll_wait() > 4. read(trace_pipe_fd) till EAGAIN > 5. Add some more data to the trace_pipe ring_buffer > 6. epoll_wait() -> this epoll_wait() will block forever > > ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, > ring_buffer_poll_wait() returns immediately without adding poll_table, > which has poll_table->_qproc pointing to ep_poll_callback(), to its > wait_queue. > ~ During the epoll_wait() call in step 3 and step 6, > ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue > because the poll_table->_qproc is NULL and it is how epoll works. > ~ When there is new data available in step 6, ring_buffer does not know > it has to call ep_poll_callback() because it is not in its wait queue. > Hence, block forever. > > Other poll implementation seems to call poll_wait() unconditionally as the very > first thing to do. For example, tcp_poll() in tcp.c. I'm trying to see the effect of this bug, but can't seem to reproduce it. Maybe I did something wrong. Attached is a test program I wrote trying to follow your instructions. I don't use epoll, so perhaps I didn't use it correctly. Can you modify it to show me the problem this is trying to fix. That is, without this patch it hangs, but with the patch it does not. Thanks! -- Steve > > Signed-off-by: Martin Lau <kafai@fb.com> > --- > kernel/trace/ring_buffer.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index fd12cc5..a6e64e8 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -613,10 +613,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu, > struct ring_buffer_per_cpu *cpu_buffer; > struct rb_irq_work *work; > > - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || > - (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) > - return POLLIN | POLLRDNORM; > - > if (cpu == RING_BUFFER_ALL_CPUS) > work = &buffer->irq_work; > else { [-- Attachment #2: ftrace-test-epoll.c --] [-- Type: text/x-c++src, Size: 1910 bytes --] #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/epoll.h> #include <fcntl.h> #include <unistd.h> static const char * debugfs_list[] = { "/debug/tracing", "/sys/kernel/debug/tracing", "/d/tracing", NULL, }; static const char *debugfs; static int markfd; static int trace_pipe_fd; static const char *find_debugfs(void) { struct stat st; int i; int r; for (i = 0; debugfs_list[i]; i++) { r = stat(debugfs_list[i], &st); if (r < 0) continue; if (S_ISDIR(st.st_mode)) return debugfs_list[i]; } return NULL; } static char * make_path(const char *file) { char *path; int size; size = strlen(debugfs) + strlen(file) + 2; path = malloc(size); if (!path) { perror("malloc"); exit(-1); } sprintf(path, "%s/%s", debugfs, file); return path; } static void mark_write(const char *str) { write(markfd, str, strlen(str)); } static void read_trace_pipe(void) { char buf[1024]; int r; while ((r = read(trace_pipe_fd, buf, 1024)) > 0) printf("%.*s", r, buf); } int main (int argc, char **argv) { struct epoll_event ee; char *marker; char *pipe; int efd; int ret; debugfs = find_debugfs(); if (!debugfs) { fprintf(stderr, "Could not find debugfs\n"); exit(-1); } marker = make_path("trace_marker"); pipe = make_path("trace_pipe"); markfd = open(marker, O_WRONLY); if (markfd < 0) { perror("marker"); exit(-1); } trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK); if (trace_pipe_fd < 0) { perror("trace_pipe"); exit(-1); } efd = epoll_create(1); if (efd < 0) { perror("epoll_create"); exit(-1); } mark_write("some data"); ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee); if (ret < 0) { perror("epoll_ctl"); exit(-1); } epoll_wait(efd, &ee, 1, 0); read_trace_pipe(); mark_write("more data"); epoll_wait(efd, &ee, 1, 0); read_trace_pipe(); return 0; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-06-10 15:49 ` Steven Rostedt @ 2014-06-11 5:58 ` Martin Lau 2014-06-26 18:34 ` Martin Lau 0 siblings, 1 reply; 10+ messages in thread From: Martin Lau @ 2014-06-11 5:58 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, clm [-- Attachment #1: Type: text/plain, Size: 2907 bytes --] Hi Steve, Attached is the modified test program. Here is the sample output: localhost ~ # ./ftrace-test-epoll-kafai <...>-1857 [000] ...1 720.174295: tracing_mark_write: some data 1857: waitting for more data...... 1858: written more data <block here> Thanks, --Martin On Tue, Jun 10, 2014 at 11:49:15AM -0400, Steven Rostedt wrote: > On Mon, 9 Jun 2014 23:06:42 -0700 > Martin Lau <kafai@fb.com> wrote: > > > ring_buffer_poll_wait() should always put the poll_table to its wait_queue > > even there is immediate data available. Otherwise, the following epoll and > > read sequence will eventually hang forever: > > > > 1. Put some data to make the trace_pipe ring_buffer read ready first > > 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) > > 3. epoll_wait() > > 4. read(trace_pipe_fd) till EAGAIN > > 5. Add some more data to the trace_pipe ring_buffer > > 6. epoll_wait() -> this epoll_wait() will block forever > > > > ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, > > ring_buffer_poll_wait() returns immediately without adding poll_table, > > which has poll_table->_qproc pointing to ep_poll_callback(), to its > > wait_queue. > > ~ During the epoll_wait() call in step 3 and step 6, > > ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue > > because the poll_table->_qproc is NULL and it is how epoll works. > > ~ When there is new data available in step 6, ring_buffer does not know > > it has to call ep_poll_callback() because it is not in its wait queue. > > Hence, block forever. > > > > Other poll implementation seems to call poll_wait() unconditionally as the very > > first thing to do. For example, tcp_poll() in tcp.c. > > I'm trying to see the effect of this bug, but can't seem to reproduce > it. Maybe I did something wrong. Attached is a test program I wrote > trying to follow your instructions. I don't use epoll, so perhaps I > didn't use it correctly. > > Can you modify it to show me the problem this is trying to fix. That > is, without this patch it hangs, but with the patch it does not. > > Thanks! > > -- Steve > > > > > Signed-off-by: Martin Lau <kafai@fb.com> > > --- > > kernel/trace/ring_buffer.c | 4 ---- > > 1 file changed, 4 deletions(-) > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > index fd12cc5..a6e64e8 100644 > > --- a/kernel/trace/ring_buffer.c > > +++ b/kernel/trace/ring_buffer.c > > @@ -613,10 +613,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu, > > struct ring_buffer_per_cpu *cpu_buffer; > > struct rb_irq_work *work; > > > > - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || > > - (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) > > - return POLLIN | POLLRDNORM; > > - > > if (cpu == RING_BUFFER_ALL_CPUS) > > work = &buffer->irq_work; > > else { > [-- Attachment #2: ftrace-test-epoll-kafai.c --] [-- Type: text/plain, Size: 2191 bytes --] #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/epoll.h> #include <fcntl.h> #include <unistd.h> #include <assert.h> static const char * debugfs_list[] = { "/debug/tracing", "/sys/kernel/debug/tracing", "/d/tracing", NULL, }; static const char *debugfs; static int markfd; static int trace_pipe_fd; static const char *find_debugfs(void) { struct stat st; int i; int r; for (i = 0; debugfs_list[i]; i++) { r = stat(debugfs_list[i], &st); if (r < 0) continue; if (S_ISDIR(st.st_mode)) return debugfs_list[i]; } return NULL; } static char * make_path(const char *file) { char *path; int size; size = strlen(debugfs) + strlen(file) + 2; path = malloc(size); if (!path) { perror("malloc"); exit(-1); } sprintf(path, "%s/%s", debugfs, file); return path; } static void mark_write(const char *str) { write(markfd, str, strlen(str)); } static void read_trace_pipe(void) { char buf[1024]; int r; while ((r = read(trace_pipe_fd, buf, 1024)) > 0) printf("%.*s", r, buf); } int main (int argc, char **argv) { struct epoll_event ee; char *marker; char *pipe; int efd; int ret; pid_t dwrt_pid; debugfs = find_debugfs(); if (!debugfs) { fprintf(stderr, "Could not find debugfs\n"); exit(-1); } marker = make_path("trace_marker"); pipe = make_path("trace_pipe"); markfd = open(marker, O_WRONLY); if (markfd < 0) { perror("marker"); exit(-1); } trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK); if (trace_pipe_fd < 0) { perror("trace_pipe"); exit(-1); } efd = epoll_create(1); if (efd < 0) { perror("epoll_create"); exit(-1); } mark_write("some data"); ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee); if (ret < 0) { perror("epoll_ctl"); exit(-1); } epoll_wait(efd, &ee, 1, -1); read_trace_pipe(); dwrt_pid = fork(); assert(dwrt_pid != -1); if (dwrt_pid == 0) { sleep(10); mark_write("more data"); printf("%d: written more data\n", getpid()); } else { printf("%d: waitting for more data......\n", getpid()); epoll_wait(efd, &ee, 1, -1); printf("%d: got more data\n", getpid()); read_trace_pipe(); } return 0; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-06-11 5:58 ` Martin Lau @ 2014-06-26 18:34 ` Martin Lau 2014-06-27 0:53 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Martin Lau @ 2014-06-26 18:34 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, clm Hi Steve, Can the modified test program reproduce the problem in your test setup? Thanks, --Martin On Tue, Jun 10, 2014 at 10:58:14PM -0700, Martin Lau wrote: > Hi Steve, > > Attached is the modified test program. Here is the sample output: > > localhost ~ # ./ftrace-test-epoll-kafai > <...>-1857 [000] ...1 720.174295: tracing_mark_write: some data > 1857: waitting for more data...... > 1858: written more data > <block here> > > Thanks, > --Martin > > On Tue, Jun 10, 2014 at 11:49:15AM -0400, Steven Rostedt wrote: > > On Mon, 9 Jun 2014 23:06:42 -0700 > > Martin Lau <kafai@fb.com> wrote: > > > > > ring_buffer_poll_wait() should always put the poll_table to its wait_queue > > > even there is immediate data available. Otherwise, the following epoll and > > > read sequence will eventually hang forever: > > > > > > 1. Put some data to make the trace_pipe ring_buffer read ready first > > > 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) > > > 3. epoll_wait() > > > 4. read(trace_pipe_fd) till EAGAIN > > > 5. Add some more data to the trace_pipe ring_buffer > > > 6. epoll_wait() -> this epoll_wait() will block forever > > > > > > ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, > > > ring_buffer_poll_wait() returns immediately without adding poll_table, > > > which has poll_table->_qproc pointing to ep_poll_callback(), to its > > > wait_queue. > > > ~ During the epoll_wait() call in step 3 and step 6, > > > ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue > > > because the poll_table->_qproc is NULL and it is how epoll works. > > > ~ When there is new data available in step 6, ring_buffer does not know > > > it has to call ep_poll_callback() because it is not in its wait queue. > > > Hence, block forever. > > > > > > Other poll implementation seems to call poll_wait() unconditionally as the very > > > first thing to do. For example, tcp_poll() in tcp.c. > > > > I'm trying to see the effect of this bug, but can't seem to reproduce > > it. Maybe I did something wrong. Attached is a test program I wrote > > trying to follow your instructions. I don't use epoll, so perhaps I > > didn't use it correctly. > > > > Can you modify it to show me the problem this is trying to fix. That > > is, without this patch it hangs, but with the patch it does not. > > > > Thanks! > > > > -- Steve > > > > > > > > Signed-off-by: Martin Lau <kafai@fb.com> > > > --- > > > kernel/trace/ring_buffer.c | 4 ---- > > > 1 file changed, 4 deletions(-) > > > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > > index fd12cc5..a6e64e8 100644 > > > --- a/kernel/trace/ring_buffer.c > > > +++ b/kernel/trace/ring_buffer.c > > > @@ -613,10 +613,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu, > > > struct ring_buffer_per_cpu *cpu_buffer; > > > struct rb_irq_work *work; > > > > > > - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || > > > - (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) > > > - return POLLIN | POLLRDNORM; > > > - > > > if (cpu == RING_BUFFER_ALL_CPUS) > > > work = &buffer->irq_work; > > > else { > > > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <sys/epoll.h> > #include <fcntl.h> > #include <unistd.h> > #include <assert.h> > > static const char * debugfs_list[] = { > "/debug/tracing", > "/sys/kernel/debug/tracing", > "/d/tracing", > NULL, > }; > > static const char *debugfs; > static int markfd; > static int trace_pipe_fd; > > static const char *find_debugfs(void) > { > struct stat st; > int i; > int r; > > for (i = 0; debugfs_list[i]; i++) { > r = stat(debugfs_list[i], &st); > if (r < 0) > continue; > if (S_ISDIR(st.st_mode)) > return debugfs_list[i]; > } > return NULL; > } > > static char * make_path(const char *file) > { > char *path; > int size; > > size = strlen(debugfs) + strlen(file) + 2; > path = malloc(size); > if (!path) { > perror("malloc"); > exit(-1); > } > sprintf(path, "%s/%s", debugfs, file); > return path; > } > > static void mark_write(const char *str) > { > write(markfd, str, strlen(str)); > } > > static void read_trace_pipe(void) > { > char buf[1024]; > int r; > > while ((r = read(trace_pipe_fd, buf, 1024)) > 0) > printf("%.*s", r, buf); > } > > int main (int argc, char **argv) > { > struct epoll_event ee; > char *marker; > char *pipe; > int efd; > int ret; > pid_t dwrt_pid; > > debugfs = find_debugfs(); > if (!debugfs) { > fprintf(stderr, "Could not find debugfs\n"); > exit(-1); > } > > marker = make_path("trace_marker"); > pipe = make_path("trace_pipe"); > > markfd = open(marker, O_WRONLY); > if (markfd < 0) { > perror("marker"); > exit(-1); > } > trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK); > if (trace_pipe_fd < 0) { > perror("trace_pipe"); > exit(-1); > } > > efd = epoll_create(1); > if (efd < 0) { > perror("epoll_create"); > exit(-1); > } > > mark_write("some data"); > ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee); > if (ret < 0) { > perror("epoll_ctl"); > exit(-1); > } > epoll_wait(efd, &ee, 1, -1); > read_trace_pipe(); > dwrt_pid = fork(); > assert(dwrt_pid != -1); > if (dwrt_pid == 0) { > sleep(10); > mark_write("more data"); > printf("%d: written more data\n", getpid()); > } else { > printf("%d: waitting for more data......\n", getpid()); > epoll_wait(efd, &ee, 1, -1); > printf("%d: got more data\n", getpid()); > read_trace_pipe(); > } > return 0; > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-06-26 18:34 ` Martin Lau @ 2014-06-27 0:53 ` Steven Rostedt 2014-06-27 6:52 ` Martin Lau 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2014-06-27 0:53 UTC (permalink / raw) To: Martin Lau; +Cc: linux-kernel, clm On Thu, 26 Jun 2014 11:34:46 -0700 Martin Lau <kafai@fb.com> wrote: > Hi Steve, > > Can the modified test program reproduce the problem in your test setup? Ah sorry, got distracted by other work. OK, I just tried it out, and here's my results: I ran you code with my current kernel and this is what I got: # ./ftrace-test-epoll-kafai <...>-3183 [002] ...1 81.777891: tracing_mark_write: some data 3183: waitting for more data...... 3184: written more data And it just hung there. Then I applied your patch, compiled and booted it, and ran the test again, and I got this: # ./ftrace-test-epoll-kafai It just hung there. No forward progress. I don't think that was the result you intended. -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-06-27 0:53 ` Steven Rostedt @ 2014-06-27 6:52 ` Martin Lau 2014-07-10 22:20 ` Martin Lau 0 siblings, 1 reply; 10+ messages in thread From: Martin Lau @ 2014-06-27 6:52 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, clm [-- Attachment #1: Type: text/plain, Size: 1219 bytes --] Hi Steve, I retried the test program (with the kernel patch). It does block from time to time. I spotted the ee.events is not set to EPOLLIN before calling epll_ctl(eft, EPOLL_CTL_ADD,...). I fixed it and ran the test in a bash-loop. I have the kafai-2 version attached (with some more logging in case if it still blocks). Can you retry? Thanks, Martin On Thu, Jun 26, 2014 at 08:53:27PM -0400, Steven Rostedt wrote: > On Thu, 26 Jun 2014 11:34:46 -0700 > Martin Lau <kafai@fb.com> wrote: > > > Hi Steve, > > > > Can the modified test program reproduce the problem in your test setup? > > Ah sorry, got distracted by other work. > > OK, I just tried it out, and here's my results: > > I ran you code with my current kernel and this is what I got: > > # ./ftrace-test-epoll-kafai > <...>-3183 [002] ...1 81.777891: tracing_mark_write: > some data 3183: waitting for more data...... > 3184: written more data > > And it just hung there. > > > Then I applied your patch, compiled and booted it, and ran the test > again, and I got this: > > # ./ftrace-test-epoll-kafai > > It just hung there. No forward progress. > > I don't think that was the result you intended. > > -- Steve > > [-- Attachment #2: ftrace-test-epoll-kafai-2.c --] [-- Type: text/plain, Size: 2392 bytes --] #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/epoll.h> #include <fcntl.h> #include <unistd.h> #include <assert.h> static const char * debugfs_list[] = { "/debug/tracing", "/sys/kernel/debug/tracing", "/d/tracing", NULL, }; static const char *debugfs; static int markfd; static int trace_pipe_fd; static void alog(const char *name, int ret) { printf("%d: %s: %d\n", getpid(), name, ret); } static const char *find_debugfs(void) { struct stat st; int i; int r; for (i = 0; debugfs_list[i]; i++) { r = stat(debugfs_list[i], &st); if (r < 0) continue; if (S_ISDIR(st.st_mode)) return debugfs_list[i]; } return NULL; } static char * make_path(const char *file) { char *path; int size; size = strlen(debugfs) + strlen(file) + 2; path = malloc(size); if (!path) { perror("malloc"); exit(-1); } sprintf(path, "%s/%s", debugfs, file); return path; } static void mark_write(const char *str) { int ret; ret = write(markfd, str, strlen(str)); alog("write(markfd)", ret); } static void read_trace_pipe(void) { char buf[1024]; int r; while ((r = read(trace_pipe_fd, buf, 1024)) > 0) printf("%.*s", r, buf); } int main (int argc, char **argv) { struct epoll_event ee; char *marker; char *pipe; int efd; int ret; pid_t dwrt_pid; debugfs = find_debugfs(); if (!debugfs) { fprintf(stderr, "Could not find debugfs\n"); exit(-1); } marker = make_path("trace_marker"); pipe = make_path("trace_pipe"); markfd = open(marker, O_WRONLY); if (markfd < 0) { perror("marker"); exit(-1); } trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK); if (trace_pipe_fd < 0) { perror("trace_pipe"); exit(-1); } efd = epoll_create(1); if (efd < 0) { perror("epoll_create"); exit(-1); } mark_write("some data"); memset(&ee, 0, sizeof(ee)); ee.events = EPOLLIN; ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee); if (ret < 0) { perror("epoll_ctl"); exit(-1); } alog("waiting data......", 0); ret = epoll_wait(efd, &ee, 1, -1); alog("epoll_wait()", ret); read_trace_pipe(); dwrt_pid = fork(); assert(dwrt_pid != -1); if (dwrt_pid == 0) { sleep(10); mark_write("more data"); } else { alog("waiting form more data......", 0); ret = epoll_wait(efd, &ee, 1, -1); alog("epoll_wait()", ret); read_trace_pipe(); wait(NULL); } return 0; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-06-27 6:52 ` Martin Lau @ 2014-07-10 22:20 ` Martin Lau 2014-07-15 19:46 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Martin Lau @ 2014-07-10 22:20 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, clm Hi Steve, Do you have a chance to give it another try? Thanks, Martin On Thu, Jun 26, 2014 at 11:52:44PM -0700, Martin Lau wrote: > Hi Steve, > > I retried the test program (with the kernel patch). It does block from > time to time. I spotted the ee.events is not set to EPOLLIN before > calling epll_ctl(eft, EPOLL_CTL_ADD,...). I fixed it and ran > the test in a bash-loop. > > I have the kafai-2 version attached (with some more logging in case > if it still blocks). > > Can you retry? > > Thanks, > Martin > > On Thu, Jun 26, 2014 at 08:53:27PM -0400, Steven Rostedt wrote: > > On Thu, 26 Jun 2014 11:34:46 -0700 > > Martin Lau <kafai@fb.com> wrote: > > > > > Hi Steve, > > > > > > Can the modified test program reproduce the problem in your test setup? > > > > Ah sorry, got distracted by other work. > > > > OK, I just tried it out, and here's my results: > > > > I ran you code with my current kernel and this is what I got: > > > > # ./ftrace-test-epoll-kafai > > <...>-3183 [002] ...1 81.777891: tracing_mark_write: > > some data 3183: waitting for more data...... > > 3184: written more data > > > > And it just hung there. > > > > > > Then I applied your patch, compiled and booted it, and ran the test > > again, and I got this: > > > > # ./ftrace-test-epoll-kafai > > > > It just hung there. No forward progress. > > > > I don't think that was the result you intended. > > > > -- Steve > > > > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <sys/epoll.h> > #include <fcntl.h> > #include <unistd.h> > #include <assert.h> > > static const char * debugfs_list[] = { > "/debug/tracing", > "/sys/kernel/debug/tracing", > "/d/tracing", > NULL, > }; > > static const char *debugfs; > static int markfd; > static int trace_pipe_fd; > > static void alog(const char *name, int ret) > { > printf("%d: %s: %d\n", getpid(), name, ret); > } > > static const char *find_debugfs(void) > { > struct stat st; > int i; > int r; > > for (i = 0; debugfs_list[i]; i++) { > r = stat(debugfs_list[i], &st); > if (r < 0) > continue; > if (S_ISDIR(st.st_mode)) > return debugfs_list[i]; > } > return NULL; > } > > static char * make_path(const char *file) > { > char *path; > int size; > > size = strlen(debugfs) + strlen(file) + 2; > path = malloc(size); > if (!path) { > perror("malloc"); > exit(-1); > } > sprintf(path, "%s/%s", debugfs, file); > return path; > } > > static void mark_write(const char *str) > { > int ret; > ret = write(markfd, str, strlen(str)); > alog("write(markfd)", ret); > } > > static void read_trace_pipe(void) > { > char buf[1024]; > int r; > > while ((r = read(trace_pipe_fd, buf, 1024)) > 0) > printf("%.*s", r, buf); > } > > int main (int argc, char **argv) > { > struct epoll_event ee; > char *marker; > char *pipe; > int efd; > int ret; > pid_t dwrt_pid; > > debugfs = find_debugfs(); > if (!debugfs) { > fprintf(stderr, "Could not find debugfs\n"); > exit(-1); > } > > marker = make_path("trace_marker"); > pipe = make_path("trace_pipe"); > > markfd = open(marker, O_WRONLY); > if (markfd < 0) { > perror("marker"); > exit(-1); > } > trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK); > if (trace_pipe_fd < 0) { > perror("trace_pipe"); > exit(-1); > } > > efd = epoll_create(1); > if (efd < 0) { > perror("epoll_create"); > exit(-1); > } > > mark_write("some data"); > memset(&ee, 0, sizeof(ee)); > ee.events = EPOLLIN; > ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee); > if (ret < 0) { > perror("epoll_ctl"); > exit(-1); > } > alog("waiting data......", 0); > ret = epoll_wait(efd, &ee, 1, -1); > alog("epoll_wait()", ret); > read_trace_pipe(); > dwrt_pid = fork(); > assert(dwrt_pid != -1); > if (dwrt_pid == 0) { > sleep(10); > mark_write("more data"); > } else { > alog("waiting form more data......", 0); > ret = epoll_wait(efd, &ee, 1, -1); > alog("epoll_wait()", ret); > read_trace_pipe(); > wait(NULL); > } > return 0; > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-07-10 22:20 ` Martin Lau @ 2014-07-15 19:46 ` Steven Rostedt 2014-07-15 20:20 ` Martin Lau 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2014-07-15 19:46 UTC (permalink / raw) To: Martin Lau; +Cc: linux-kernel, clm [-- Attachment #1: Type: text/plain, Size: 503 bytes --] On Thu, 10 Jul 2014 15:20:30 -0700 Martin Lau <kafai@fb.com> wrote: > Hi Steve, > > Do you have a chance to give it another try? > OK, I finally got around to testing it. Yep I see your point. Do you think it should go to 3.16 and stable? I can add it to my next git pull request. Also, I modified your test such that it wouldn't hang on failure, but detects that it hung and returns a exit value to add this to my testing. That is, it exits with zero on success and non zero on failure. -- Steve [-- Attachment #2: ftrace-test-epoll-kafai-2.c --] [-- Type: text/x-c++src, Size: 2759 bytes --] #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/stat.h> #include <sys/epoll.h> #include <fcntl.h> #include <unistd.h> #include <assert.h> #include <signal.h> #include <errno.h> static const char * debugfs_list[] = { "/debug/tracing", "/sys/kernel/debug/tracing", "/d/tracing", NULL, }; static const char *debugfs; static int markfd; static int trace_pipe_fd; static void alog(const char *name, int ret) { printf("%d: %s: %d\n", getpid(), name, ret); } static const char *find_debugfs(void) { struct stat st; int i; int r; for (i = 0; debugfs_list[i]; i++) { r = stat(debugfs_list[i], &st); if (r < 0) continue; if (S_ISDIR(st.st_mode)) return debugfs_list[i]; } return NULL; } static char * make_path(const char *file) { char *path; int size; size = strlen(debugfs) + strlen(file) + 2; path = malloc(size); if (!path) { perror("malloc"); exit(-1); } sprintf(path, "%s/%s", debugfs, file); return path; } static void mark_write(const char *str) { int ret; ret = write(markfd, str, strlen(str)); alog("write(markfd)", ret); } static void read_trace_pipe(void) { char buf[1024]; int r; while ((r = read(trace_pipe_fd, buf, 1024)) > 0) printf("%.*s", r, buf); } int main (int argc, char **argv) { struct epoll_event ee; char *marker; char *pipe; int efd; int ret; pid_t dwrt_pid; debugfs = find_debugfs(); if (!debugfs) { fprintf(stderr, "Could not find debugfs\n"); exit(-1); } marker = make_path("trace_marker"); pipe = make_path("trace_pipe"); markfd = open(marker, O_WRONLY); if (markfd < 0) { perror("marker"); exit(-1); } trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK); if (trace_pipe_fd < 0) { perror("trace_pipe"); exit(-1); } efd = epoll_create(1); if (efd < 0) { perror("epoll_create"); exit(-1); } mark_write("some data"); memset(&ee, 0, sizeof(ee)); ee.events = EPOLLIN; ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee); if (ret < 0) { perror("epoll_ctl"); exit(-1); } alog("waiting data......", 0); ret = epoll_wait(efd, &ee, 1, -1); alog("epoll_wait()", ret); read_trace_pipe(); dwrt_pid = fork(); assert(dwrt_pid != -1); if (dwrt_pid > 0) { int status; sleep(10); mark_write("more data"); sleep(10); ret = waitpid(dwrt_pid, &status, WNOHANG); if (ret != dwrt_pid) { alog("Poll never finished!", 0); kill(dwrt_pid, 9); exit(-1); } if (WEXITSTATUS(status) != 0) { alog("Something failed on polling!", WEXITSTATUS(status)); exit(-1); } } else { alog("waiting form more data......", 0); ret = epoll_wait(efd, &ee, 1, -1); alog("epoll_wait()", ret); read_trace_pipe(); if (ret < 0) exit(errno); } exit (0); } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-07-15 19:46 ` Steven Rostedt @ 2014-07-15 20:20 ` Martin Lau 0 siblings, 0 replies; 10+ messages in thread From: Martin Lau @ 2014-07-15 20:20 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, clm Yes, I think it should go to both stable and 3.16. Thanks, --Martin On Tue, Jul 15, 2014 at 03:46:12PM -0400, Steven Rostedt wrote: > On Thu, 10 Jul 2014 15:20:30 -0700 > Martin Lau <kafai@fb.com> wrote: > > > Hi Steve, > > > > Do you have a chance to give it another try? > > > > OK, I finally got around to testing it. Yep I see your point. Do you > think it should go to 3.16 and stable? I can add it to my next git pull > request. > > Also, I modified your test such that it wouldn't hang on failure, but > detects that it hung and returns a exit value to add this to my > testing. That is, it exits with zero on success and non zero on failure. > > -- Steve > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <sys/types.h> > #include <sys/wait.h> > #include <sys/stat.h> > #include <sys/epoll.h> > #include <fcntl.h> > #include <unistd.h> > #include <assert.h> > #include <signal.h> > #include <errno.h> > > static const char * debugfs_list[] = { > "/debug/tracing", > "/sys/kernel/debug/tracing", > "/d/tracing", > NULL, > }; > > static const char *debugfs; > static int markfd; > static int trace_pipe_fd; > > static void alog(const char *name, int ret) > { > printf("%d: %s: %d\n", getpid(), name, ret); > } > > static const char *find_debugfs(void) > { > struct stat st; > int i; > int r; > > for (i = 0; debugfs_list[i]; i++) { > r = stat(debugfs_list[i], &st); > if (r < 0) > continue; > if (S_ISDIR(st.st_mode)) > return debugfs_list[i]; > } > return NULL; > } > > static char * make_path(const char *file) > { > char *path; > int size; > > size = strlen(debugfs) + strlen(file) + 2; > path = malloc(size); > if (!path) { > perror("malloc"); > exit(-1); > } > sprintf(path, "%s/%s", debugfs, file); > return path; > } > > static void mark_write(const char *str) > { > int ret; > ret = write(markfd, str, strlen(str)); > alog("write(markfd)", ret); > } > > static void read_trace_pipe(void) > { > char buf[1024]; > int r; > > while ((r = read(trace_pipe_fd, buf, 1024)) > 0) > printf("%.*s", r, buf); > } > > int main (int argc, char **argv) > { > struct epoll_event ee; > char *marker; > char *pipe; > int efd; > int ret; > pid_t dwrt_pid; > > debugfs = find_debugfs(); > if (!debugfs) { > fprintf(stderr, "Could not find debugfs\n"); > exit(-1); > } > > marker = make_path("trace_marker"); > pipe = make_path("trace_pipe"); > > markfd = open(marker, O_WRONLY); > if (markfd < 0) { > perror("marker"); > exit(-1); > } > trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK); > if (trace_pipe_fd < 0) { > perror("trace_pipe"); > exit(-1); > } > > efd = epoll_create(1); > if (efd < 0) { > perror("epoll_create"); > exit(-1); > } > > mark_write("some data"); > memset(&ee, 0, sizeof(ee)); > ee.events = EPOLLIN; > ret = epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, &ee); > if (ret < 0) { > perror("epoll_ctl"); > exit(-1); > } > alog("waiting data......", 0); > ret = epoll_wait(efd, &ee, 1, -1); > alog("epoll_wait()", ret); > read_trace_pipe(); > dwrt_pid = fork(); > assert(dwrt_pid != -1); > if (dwrt_pid > 0) { > int status; > sleep(10); > mark_write("more data"); > sleep(10); > ret = waitpid(dwrt_pid, &status, WNOHANG); > if (ret != dwrt_pid) { > alog("Poll never finished!", 0); > kill(dwrt_pid, 9); > exit(-1); > } > if (WEXITSTATUS(status) != 0) { > alog("Something failed on polling!", > WEXITSTATUS(status)); > exit(-1); > } > } else { > alog("waiting form more data......", 0); > ret = epoll_wait(efd, &ee, 1, -1); > alog("epoll_wait()", ret); > read_trace_pipe(); > if (ret < 0) > exit(errno); > } > exit (0); > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ring-buffer: Fix polling on trace_pipe 2014-06-10 6:06 [PATCH] ring-buffer: Fix polling on trace_pipe Martin Lau 2014-06-10 15:49 ` Steven Rostedt @ 2014-07-15 17:32 ` Chris Mason 1 sibling, 0 replies; 10+ messages in thread From: Chris Mason @ 2014-07-15 17:32 UTC (permalink / raw) To: Martin Lau, rostedt; +Cc: linux-kernel On 06/10/2014 02:06 AM, Martin Lau wrote: > ring_buffer_poll_wait() should always put the poll_table to its wait_queue > even there is immediate data available. Otherwise, the following epoll and > read sequence will eventually hang forever: > > 1. Put some data to make the trace_pipe ring_buffer read ready first > 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) > 3. epoll_wait() > 4. read(trace_pipe_fd) till EAGAIN > 5. Add some more data to the trace_pipe ring_buffer > 6. epoll_wait() -> this epoll_wait() will block forever > > ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, > ring_buffer_poll_wait() returns immediately without adding poll_table, > which has poll_table->_qproc pointing to ep_poll_callback(), to its > wait_queue. > ~ During the epoll_wait() call in step 3 and step 6, > ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue > because the poll_table->_qproc is NULL and it is how epoll works. > ~ When there is new data available in step 6, ring_buffer does not know > it has to call ep_poll_callback() because it is not in its wait queue. > Hence, block forever. > > Other poll implementation seems to call poll_wait() unconditionally as the very > first thing to do. For example, tcp_poll() in tcp.c. Reviewed-by: Chris Mason <clm@fb.com> This looked horribly wrong to me at first, but Martin walked me through how the polling code is setting up waiters. We have it in production here. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-07-15 20:20 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-10 6:06 [PATCH] ring-buffer: Fix polling on trace_pipe Martin Lau 2014-06-10 15:49 ` Steven Rostedt 2014-06-11 5:58 ` Martin Lau 2014-06-26 18:34 ` Martin Lau 2014-06-27 0:53 ` Steven Rostedt 2014-06-27 6:52 ` Martin Lau 2014-07-10 22:20 ` Martin Lau 2014-07-15 19:46 ` Steven Rostedt 2014-07-15 20:20 ` Martin Lau 2014-07-15 17:32 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox