* read() on relayfs channel returns premature 0
@ 2005-03-23 9:02 kingsley
2005-03-23 15:29 ` Tom Zanussi
2005-04-18 1:29 ` Relayfs Question: Use of relay_reset(). Potential race? Kingsley Cheung
0 siblings, 2 replies; 12+ messages in thread
From: kingsley @ 2005-03-23 9:02 UTC (permalink / raw)
To: Tom Zanussi, Karim Yaghmour; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 909 bytes --]
Hi
I'm using relayfs to relay data from a kernel module to user space on
a SuSE 2.6.5 kernel. I'm not absolutely sure what version of relayfs
has been back ported to it.
While reading data from the channel I've been seeing read() return 0
prematurely. However, the 0 does not signify that the file is being
closed for there is still data available afterwards.
I've noticed that zeros occur when roughly one page of data has been
read. I suspect that they occur whenever there is a read across the
relayfs sub-buffers.
Now I understand that this is not the latest release of relayfs (there
are the redux patches, which I have yet to try). Nonetheless I'd like
to know whether this behaviour is deliberate. Is it?
Thanks,
--
Kingsley
P.S. I've been able to get around this by deliberately modifying
do_read() with the attached patch. I'm not absolutely sure its
correct but it seems to work.
[-- Attachment #2: relayfs-premature-zero.patch --]
[-- Type: text/plain, Size: 1340 bytes --]
Index: fs/relayfs/relay.c
===================================================================
RCS file: /export/cvsroot/SuSE-Kernel-2.4.21/fs/relayfs/Attic/relay.c,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 relay.c
--- fs/relayfs/relay.c 18 Feb 2005 00:00:51 -0000 1.1.2.1
+++ fs/relayfs/relay.c 23 Mar 2005 09:00:42 -0000
@@ -1445,26 +1445,22 @@
avail_offset = cur_idx = relay_get_offset(rchan, &max_offset);
+ last_buf_byte_offset = (read_bufno + 1) * buf_size - 1;
if (cur_idx == read_offset) {
if (atomic_read(&rchan->suspended) == 1) {
- read_offset += 1;
+ read_offset = last_buf_byte_offset + 1;
if (read_offset >= max_offset)
read_offset = 0;
*actual_read_offset = read_offset;
} else {
- *new_offset = read_offset;
+ *new_offset = read_offset;
return 0;
}
- } else {
- last_buf_byte_offset = (read_bufno + 1) * buf_size - 1;
- if (read_offset == last_buf_byte_offset) {
- if (unused_bytes != 1) {
- read_offset += 1;
- if (read_offset >= max_offset)
- read_offset = 0;
- *actual_read_offset = read_offset;
- }
- }
+ } else if ((read_offset + unused_bytes) > last_buf_byte_offset) {
+ read_offset = last_buf_byte_offset + 1;
+ if (read_offset >= max_offset)
+ read_offset = 0;
+ *actual_read_offset = read_offset;
}
read_bufno = read_offset / buf_size;
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: read() on relayfs channel returns premature 0 2005-03-23 9:02 read() on relayfs channel returns premature 0 kingsley @ 2005-03-23 15:29 ` Tom Zanussi 2005-03-24 1:29 ` Kingsley Cheung 2005-04-18 1:29 ` Relayfs Question: Use of relay_reset(). Potential race? Kingsley Cheung 1 sibling, 1 reply; 12+ messages in thread From: Tom Zanussi @ 2005-03-23 15:29 UTC (permalink / raw) To: kingsley; +Cc: Tom Zanussi, Karim Yaghmour, linux-kernel kingsley@aurema.com writes: > > Now I understand that this is not the latest release of relayfs (there > are the redux patches, which I have yet to try). Nonetheless I'd like > to know whether this behaviour is deliberate. Is it? Nope, looks like you've found a bug - thanks for the patch. Any chance you can send me your module so I can easily reproduce the problem and test the fix? Thanks, Tom ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() on relayfs channel returns premature 0 2005-03-23 15:29 ` Tom Zanussi @ 2005-03-24 1:29 ` Kingsley Cheung 2005-03-24 6:56 ` Jan Engelhardt 2005-03-24 19:11 ` Tom Zanussi 0 siblings, 2 replies; 12+ messages in thread From: Kingsley Cheung @ 2005-03-24 1:29 UTC (permalink / raw) To: Tom Zanussi; +Cc: Karim Yaghmour, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1038 bytes --] On Wed, Mar 23, 2005 at 09:29:12AM -0600, Tom Zanussi wrote: > kingsley@aurema.com writes: > > > > Now I understand that this is not the latest release of relayfs (there > > are the redux patches, which I have yet to try). Nonetheless I'd like > > to know whether this behaviour is deliberate. Is it? > > Nope, looks like you've found a bug - thanks for the patch. Any > chance you can send me your module so I can easily reproduce the > problem and test the fix? > > Thanks, > > Tom > Tom, Yes, well a cut down version of the kernel module anyway. Its in the attached tar ball. The kernel module requires pagg as well as relayfs to work. Basically the module tracks kernel events - in this case process execs. I've tested it on 2.6.10 with the pagg and relayfs patches from http://www.opersys.com/ftp/pub/relayfs/patch-relayfs-2.6.10-050113 and ftp://oss.sgi.com/projects/pagg/download/linux-2.6.10-pagg.patch-4 read() gives me a zero still once about a page of data has been read. Many thanks, -- Kingsley [-- Attachment #2: RelayfsModule.tar.bz2 --] [-- Type: application/x-bzip2, Size: 2629 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() on relayfs channel returns premature 0 2005-03-24 1:29 ` Kingsley Cheung @ 2005-03-24 6:56 ` Jan Engelhardt 2005-03-24 19:11 ` Tom Zanussi 1 sibling, 0 replies; 12+ messages in thread From: Jan Engelhardt @ 2005-03-24 6:56 UTC (permalink / raw) To: Kingsley Cheung; +Cc: linux-kernel 1>http://www.opersys.com/ftp/pub/relayfs/patch-relayfs-2.6.10-050113 2>ftp://oss.sgi.com/projects/pagg/download/linux-2.6.10-pagg.patch-4 I guess (2) is already in 2.6.11, because I do not get any compile and load errors (good so!), is it? Jan Engelhardt -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() on relayfs channel returns premature 0 2005-03-24 1:29 ` Kingsley Cheung 2005-03-24 6:56 ` Jan Engelhardt @ 2005-03-24 19:11 ` Tom Zanussi 2005-03-24 19:45 ` Jan Engelhardt 2005-03-28 23:43 ` Kingsley Cheung 1 sibling, 2 replies; 12+ messages in thread From: Tom Zanussi @ 2005-03-24 19:11 UTC (permalink / raw) To: Kingsley Cheung; +Cc: Tom Zanussi, Karim Yaghmour, linux-kernel Kingsley Cheung writes: > On Wed, Mar 23, 2005 at 09:29:12AM -0600, Tom Zanussi wrote: > > kingsley@aurema.com writes: > > > > > > Now I understand that this is not the latest release of relayfs (there > > > are the redux patches, which I have yet to try). Nonetheless I'd like [...] > > I've tested it on 2.6.10 with the pagg and relayfs patches from > > http://www.opersys.com/ftp/pub/relayfs/patch-relayfs-2.6.10-050113 > > and > > ftp://oss.sgi.com/projects/pagg/download/linux-2.6.10-pagg.patch-4 > > read() gives me a zero still once about a page of data has been read. > Thanks - I've tried it out and haven't immediately been able to reproduce the problem yet - I'll do some more pounding on it though when I get the chance. BTW, I just want to point out that there aren't any problems with read() in the version of relayfs included in the -mm tree (i.e. the 'redux' version), since of course it doesn't support read(). I went ahead and did a quick port of your stuff to the new version of relayfs, which you can find here: http://prdownloads.sourceforge.net/dprobes/RelayfsModule-new.tar.bz2?download There's a README in the tarball with some notes on building and running it. It includes both the kernel and user-side code, which makes use of the relay-apps code and documentation found here (I've included the necessary files in the RelayfsModule-new tarball so you don't have to actually get this): http://prdownloads.sourceforge.net/dprobes/relay-apps-0.2.tar.gz?download Hopefully the new version will still be useful for what you're trying to do, but it does differ in a couple important ways from the old version, and points up the fact that the new relayfs really is now much more specialized for high-volume applications - - your old version used 'packet' mode with read(). The new relayfs only supports 'bulk' mode with mmap(), which means it's not really useful for notification of single events. I'm not sure how important that is for your application - if you're mainly interested in collecting data, you can certainly use it even for low-volume applications, but the events will be sent to user space in 'blocks'. You can modify the user space app to process blocks of events as they come in, and play around with buffer sizes to get them more often, but it's not quite the same thing. - your old version used a single buffer, while the new relayfs only supports per-cpu buffers, which means you'd have to sort out the events in user space and impose some ordering using timestamps for instance if your data doesn't already have a natural ordering (BTW, the new relayfs doesn't have an option any longer to do automatic timestamping either). I'll continue maintaining the old relayfs for existing users (so thanks for the patch and the test code) so if the new version doesn't fit your needs, you'll still have the old version to fall back on. Thanks, Tom ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() on relayfs channel returns premature 0 2005-03-24 19:11 ` Tom Zanussi @ 2005-03-24 19:45 ` Jan Engelhardt 2005-03-25 12:27 ` Karim Yaghmour 2005-03-28 23:43 ` Kingsley Cheung 1 sibling, 1 reply; 12+ messages in thread From: Jan Engelhardt @ 2005-03-24 19:45 UTC (permalink / raw) Cc: linux-kernel >BTW, I just want to point out that there aren't any problems with >read() in the version of relayfs included in the -mm tree (i.e. the >'redux' version), since of course it doesn't support read(). Hm? Relayfs does not support a `cat /dev/relay/AChannelName` anymore? >I'll continue maintaining the old relayfs for existing users (so >thanks for the patch and the test code) so if the new version doesn't >fit your needs, you'll still have the old version to fall back on. Do you have the "new relayfs" as a "normal" file (outside any revision control system), e.g. a diff patch? >Hopefully the new version will still be useful for what you're trying to do >- your old version used 'packet' mode with read(). The new relayfs >only supports 'bulk' mode with mmap() >- your old version used a single buffer, while the new relayfs only >supports per-cpu buffers >(BTW, the new relayfs doesn't have an option any longer to do automatic >timestamping either). I wanted to port the kernel side of my keylogger over to relayfs, but given that API and functionality have now changed I am somewhat reluctant to do it. Jan Engelhardt -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() on relayfs channel returns premature 0 2005-03-24 19:45 ` Jan Engelhardt @ 2005-03-25 12:27 ` Karim Yaghmour 0 siblings, 0 replies; 12+ messages in thread From: Karim Yaghmour @ 2005-03-25 12:27 UTC (permalink / raw) To: Jan Engelhardt; +Cc: linux-kernel Jan Engelhardt wrote: > Hm? Relayfs does not support a `cat /dev/relay/AChannelName` anymore? This was a requirement for it to be included. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() on relayfs channel returns premature 0 2005-03-24 19:11 ` Tom Zanussi 2005-03-24 19:45 ` Jan Engelhardt @ 2005-03-28 23:43 ` Kingsley Cheung 1 sibling, 0 replies; 12+ messages in thread From: Kingsley Cheung @ 2005-03-28 23:43 UTC (permalink / raw) To: Tom Zanussi; +Cc: Karim Yaghmour, linux-kernel On Thu, Mar 24, 2005 at 01:11:39PM -0600, Tom Zanussi wrote: > Kingsley Cheung writes: > > On Wed, Mar 23, 2005 at 09:29:12AM -0600, Tom Zanussi wrote: > > > kingsley@aurema.com writes: > > > > > > > > Now I understand that this is not the latest release of relayfs (there > > > > are the redux patches, which I have yet to try). Nonetheless I'd like > > [...] > > > > > I've tested it on 2.6.10 with the pagg and relayfs patches from > > > > http://www.opersys.com/ftp/pub/relayfs/patch-relayfs-2.6.10-050113 > > > > and > > > > ftp://oss.sgi.com/projects/pagg/download/linux-2.6.10-pagg.patch-4 > > > > read() gives me a zero still once about a page of data has been read. > > > > Thanks - I've tried it out and haven't immediately been able to > reproduce the problem yet - I'll do some more pounding on it though > when I get the chance. Ok. It should be fairly easy to reproduce. On a dual 400MHz PII I've been able to get the zero by running one reader of the channel in parallel with a simple shell script like "while :; do ps -ef > /dev/null; done". > > BTW, I just want to point out that there aren't any problems with > read() in the version of relayfs included in the -mm tree (i.e. the > 'redux' version), since of course it doesn't support read(). Ok. > I went ahead and did a quick port of your stuff to the new version of > relayfs, which you can find here: > > http://prdownloads.sourceforge.net/dprobes/RelayfsModule-new.tar.bz2?download > > There's a README in the tarball with some notes on building and > running it. It includes both the kernel and user-side code, which makes > use of the relay-apps code and documentation found here (I've included > the necessary files in the RelayfsModule-new tarball so you don't have > to actually get this): > > http://prdownloads.sourceforge.net/dprobes/relay-apps-0.2.tar.gz?download > > Hopefully the new version will still be useful for what you're trying > to do, but it does differ in a couple important ways from the old > version, and points up the fact that the new relayfs really is now > much more specialized for high-volume applications - > > - your old version used 'packet' mode with read(). The new relayfs > only supports 'bulk' mode with mmap(), which means it's not really > useful for notification of single events. I'm not sure how important > that is for your application - if you're mainly interested in > collecting data, you can certainly use it even for low-volume > applications, but the events will be sent to user space in 'blocks'. > You can modify the user space app to process blocks of events as they > come in, and play around with buffer sizes to get them more often, but > it's not quite the same thing. > > - your old version used a single buffer, while the new relayfs only > supports per-cpu buffers, which means you'd have to sort out the > events in user space and impose some ordering using timestamps for > instance if your data doesn't already have a natural ordering (BTW, > the new relayfs doesn't have an option any longer to do automatic > timestamping either). Right. Thanks for all your work on it Tom . I really appreciate it. I have been considering a move over to the redux version recently. Your port will make it easier for me to test both versions out. I'll keep your pointers in mind. > I'll continue maintaining the old relayfs for existing users (so > thanks for the patch and the test code) so if the new version doesn't > fit your needs, you'll still have the old version to fall back on. > > Thanks, > > Tom Ok. Thanks again! -- Kingsley ^ permalink raw reply [flat|nested] 12+ messages in thread
* Relayfs Question: Use of relay_reset(). Potential race? 2005-03-23 9:02 read() on relayfs channel returns premature 0 kingsley 2005-03-23 15:29 ` Tom Zanussi @ 2005-04-18 1:29 ` Kingsley Cheung 2005-04-18 15:56 ` Tom Zanussi 1 sibling, 1 reply; 12+ messages in thread From: Kingsley Cheung @ 2005-04-18 1:29 UTC (permalink / raw) To: Tom Zanussi; +Cc: Karim Yaghmour, linux-kernel On Wed, Mar 23, 2005 at 08:02:54PM +1100, kingsley@aurema.com wrote: > Hi > > I'm using relayfs to relay data from a kernel module to user space on > a SuSE 2.6.5 kernel. I'm not absolutely sure what version of relayfs > has been back ported to it. Hi Tom, Could you please have a look at the following use of relay_reset() in a kernel module as follows (compiled against pre-redux relayfs): static int exec_fileop_notify(int rchan_id, struct file *filp, enum relay_fileop op) { if (unlikely(rchan_id != exec_cid)) { printk(KERN_ERR "%s - bad file number\n", __FUNCTION__); return -EBADF; } switch (op) { case RELAY_FILE_OPEN: atomic_inc(&exec_client_cnt); break; case RELAY_FILE_CLOSE: if (atomic_dec_and_test(&exec_client_cnt) == 0) relay_reset(exec_cid); <--- break; default: /* do nothing */ break; } return 0; } Is that legitimate? The reason I ask is because I've been seeing garbled oopses with keventd and I've narrowed it to two things: 1) Inadequate locking on my part in the kernel module, which I have addressed separately. 2) A race with relay_reset() and keventd, which is probably of interest to you if you're still maintaining the pre-redux patches. The race is due to the use of INIT_WORK in _reset_relay(): INIT_WORK(&rchan->wake_readers, NULL, NULL); INIT_WORK(&rchan->wake_writers, NULL, NULL); However, at the time relay_reset() is called, it is possible that these work structures are still being used by keventd when under heavy loads. The workaround I've used to fix this is to call flush_scheduled_work() before calling reset_relay() in the kernel module. Perhaps that needs to be called in relay_reset() or _relay_reset()? As well I'm not sure about the uses of INIT_WORK in _relay_realloc_buffer() and relay_release() - perhaps they need attention too. I understand, however, that flush_schedule_work() blocks and thus it probably shouldn't be used in certain areas of the relayfs code. My thanks, -- Kingsley ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Relayfs Question: Use of relay_reset(). Potential race? 2005-04-18 1:29 ` Relayfs Question: Use of relay_reset(). Potential race? Kingsley Cheung @ 2005-04-18 15:56 ` Tom Zanussi 2005-04-19 0:40 ` Kingsley Cheung 0 siblings, 1 reply; 12+ messages in thread From: Tom Zanussi @ 2005-04-18 15:56 UTC (permalink / raw) To: Kingsley Cheung; +Cc: Tom Zanussi, Karim Yaghmour, linux-kernel Kingsley Cheung writes: > On Wed, Mar 23, 2005 at 08:02:54PM +1100, kingsley@aurema.com wrote: > > Hi > > > > I'm using relayfs to relay data from a kernel module to user space on > > a SuSE 2.6.5 kernel. I'm not absolutely sure what version of relayfs > > has been back ported to it. > > Hi Tom, > > Could you please have a look at the following use of relay_reset() in > a kernel module as follows (compiled against pre-redux relayfs): > > static int > exec_fileop_notify(int rchan_id, struct file *filp, enum relay_fileop op) > { > if (unlikely(rchan_id != exec_cid)) { > printk(KERN_ERR "%s - bad file number\n", __FUNCTION__); > return -EBADF; > } > > switch (op) { > case RELAY_FILE_OPEN: > atomic_inc(&exec_client_cnt); > break; > case RELAY_FILE_CLOSE: > if (atomic_dec_and_test(&exec_client_cnt) == 0) > relay_reset(exec_cid); <--- > break; > default: > /* do nothing */ > break; > } > > return 0; > } > > Is that legitimate? The reason I ask is because I've been seeing Yes, you should be able to reset the channel here, since at that point it's been closed. > garbled oopses with keventd and I've narrowed it to two things: > > 1) Inadequate locking on my part in the kernel module, which I have > addressed separately. > > 2) A race with relay_reset() and keventd, which is probably of > interest to you if you're still maintaining the pre-redux patches. > > The race is due to the use of INIT_WORK in _reset_relay(): > > INIT_WORK(&rchan->wake_readers, NULL, NULL); > INIT_WORK(&rchan->wake_writers, NULL, NULL); > > However, at the time relay_reset() is called, it is possible that > these work structures are still being used by keventd when under heavy > loads. The workaround I've used to fix this is to call > flush_scheduled_work() before calling reset_relay() in the kernel > module. Perhaps that needs to be called in relay_reset() or > _relay_reset()? Yes, flush_scheduled_work() should probably be called from __relay_reset() - thanks for catching this and suggesting the fix. BTW, I've updated the old relayfs patch with your previous fixes and ported it to 2.6.11 - it hasn't appeared yet on the opersys website, so let me know if you want it and I'll send it, or I can just send you a new version after I make these changes... > > As well I'm not sure about the uses of INIT_WORK in > _relay_realloc_buffer() and relay_release() - perhaps they need > attention too. I understand, however, that flush_schedule_work() I'll take a look at this too - looks like there might be a potential problem if a channel was closed while a resize was in progress. I don't think anyone's ever actually used resizing, but it should be fixed nonetheless. Thanks, Tom ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Relayfs Question: Use of relay_reset(). Potential race? 2005-04-18 15:56 ` Tom Zanussi @ 2005-04-19 0:40 ` Kingsley Cheung 2005-05-04 9:04 ` kingsley 0 siblings, 1 reply; 12+ messages in thread From: Kingsley Cheung @ 2005-04-19 0:40 UTC (permalink / raw) To: Tom Zanussi; +Cc: Karim Yaghmour, linux-kernel On Mon, Apr 18, 2005 at 10:56:57AM -0500, Tom Zanussi wrote: > Kingsley Cheung writes: > > On Wed, Mar 23, 2005 at 08:02:54PM +1100, kingsley@aurema.com wrote: > > > Hi > > > > > > I'm using relayfs to relay data from a kernel module to user space on > > > a SuSE 2.6.5 kernel. I'm not absolutely sure what version of relayfs > > > has been back ported to it. > > > > Hi Tom, > > > > Could you please have a look at the following use of relay_reset() in > > a kernel module as follows (compiled against pre-redux relayfs): (snip) > > Is that legitimate? The reason I ask is because I've been seeing > > Yes, you should be able to reset the channel here, since at that point > it's been closed. > > > garbled oopses with keventd and I've narrowed it to two things: > > > > 1) Inadequate locking on my part in the kernel module, which I have > > addressed separately. > > > > 2) A race with relay_reset() and keventd, which is probably of > > interest to you if you're still maintaining the pre-redux patches. > > > > The race is due to the use of INIT_WORK in _reset_relay(): > > > > INIT_WORK(&rchan->wake_readers, NULL, NULL); > > INIT_WORK(&rchan->wake_writers, NULL, NULL); > > > > However, at the time relay_reset() is called, it is possible that > > these work structures are still being used by keventd when under heavy > > loads. The workaround I've used to fix this is to call > > flush_scheduled_work() before calling reset_relay() in the kernel > > module. Perhaps that needs to be called in relay_reset() or > > _relay_reset()? Tom, Thanks for the prompt response. > Yes, flush_scheduled_work() should probably be called from > __relay_reset() - thanks for catching this and suggesting the fix. My pleasure. > BTW, I've updated the old relayfs patch with your previous fixes and > ported it to 2.6.11 - it hasn't appeared yet on the opersys website, > so let me know if you want it and I'll send it, or I can just send you > a new version after I make these changes... Ok, good to hear. I've been using the old patch against an earlier version of the kernel, so I've no rush for a patch against 2.6.11 - you can take your time with these new changes :) > > As well I'm not sure about the uses of INIT_WORK in > > _relay_realloc_buffer() and relay_release() - perhaps they need > > attention too. I understand, however, that flush_schedule_work() > > I'll take a look at this too - looks like there might be a potential > problem if a channel was closed while a resize was in progress. I > don't think anyone's ever actually used resizing, but it should be > fixed nonetheless. > > Thanks, > > Tom > Right. Thanks again for looking at it. -- Kingsley ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Relayfs Question: Use of relay_reset(). Potential race? 2005-04-19 0:40 ` Kingsley Cheung @ 2005-05-04 9:04 ` kingsley 0 siblings, 0 replies; 12+ messages in thread From: kingsley @ 2005-05-04 9:04 UTC (permalink / raw) To: Tom Zanussi, Karim Yaghmour, linux-kernel On Tue, Apr 19, 2005 at 10:40:23AM +1000, Kingsley Cheung wrote: > On Mon, Apr 18, 2005 at 10:56:57AM -0500, Tom Zanussi wrote: > > Kingsley Cheung writes: > > > On Wed, Mar 23, 2005 at 08:02:54PM +1100, kingsley@aurema.com wrote: > > > > Hi > > > > > > > > I'm using relayfs to relay data from a kernel module to user space on > > > > a SuSE 2.6.5 kernel. I'm not absolutely sure what version of relayfs > > > > has been back ported to it. > > > > > > Hi Tom, > > > > > > Could you please have a look at the following use of relay_reset() in > > > a kernel module as follows (compiled against pre-redux relayfs): > (snip) > > > Is that legitimate? The reason I ask is because I've been seeing > > > > Yes, you should be able to reset the channel here, since at that point > > it's been closed. > > > > > garbled oopses with keventd and I've narrowed it to two things: > > > > > > 1) Inadequate locking on my part in the kernel module, which I have > > > addressed separately. > > > > > > 2) A race with relay_reset() and keventd, which is probably of > > > interest to you if you're still maintaining the pre-redux patches. > > > > > > The race is due to the use of INIT_WORK in _reset_relay(): > > > > > > INIT_WORK(&rchan->wake_readers, NULL, NULL); > > > INIT_WORK(&rchan->wake_writers, NULL, NULL); > > > > > > However, at the time relay_reset() is called, it is possible that > > > these work structures are still being used by keventd when under heavy > > > loads. The workaround I've used to fix this is to call > > > flush_scheduled_work() before calling reset_relay() in the kernel > > > module. Perhaps that needs to be called in relay_reset() or > > > _relay_reset()? > > Tom, > > Thanks for the prompt response. > > > Yes, flush_scheduled_work() should probably be called from > > __relay_reset() - thanks for catching this and suggesting the fix. Tom, Sorry about this, but the fix only works if schedule_work() is used instead of schedule_delayed_work() for the pre-redux relayfs patch. I only realised this recently since I have been using the "flush" fix on a pre-redux relayfs port to 2.4 which doesn't doesn't have an equivalent of schedule_delayed_work() and have only tried it on 2.6 today. Using flush_scheduled_work() with schedule_delayed_work() doesn't work since schedule_delay_work() uses a timer to queue work at the appropriate time. Although all uses of schedule_delay_work() in relayfs adds work to the queue a tick later, this time delay is enough for flush_schedule_work() to flushes what is presently on the queue before the timer actually expires. eventd would then attempt to use a work_struct that has then been initialised by relay_reset(). With schedule_work() is instead of schedule_delayed_work() I haven't seen the race happen. Thanks, -- Kingsley ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-05-04 9:05 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-23 9:02 read() on relayfs channel returns premature 0 kingsley 2005-03-23 15:29 ` Tom Zanussi 2005-03-24 1:29 ` Kingsley Cheung 2005-03-24 6:56 ` Jan Engelhardt 2005-03-24 19:11 ` Tom Zanussi 2005-03-24 19:45 ` Jan Engelhardt 2005-03-25 12:27 ` Karim Yaghmour 2005-03-28 23:43 ` Kingsley Cheung 2005-04-18 1:29 ` Relayfs Question: Use of relay_reset(). Potential race? Kingsley Cheung 2005-04-18 15:56 ` Tom Zanussi 2005-04-19 0:40 ` Kingsley Cheung 2005-05-04 9:04 ` kingsley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox