public inbox for hail-devel@vger.kernel.org
 help / color / mirror / Atom feed
* Post-XDR CLD cannot keep session up
@ 2010-02-07  7:00 Pete Zaitcev
  2010-02-07 22:28 ` Jeff Garzik
  2010-02-09 10:34 ` Jeff Garzik
  0 siblings, 2 replies; 5+ messages in thread
From: Pete Zaitcev @ 2010-02-07  7:00 UTC (permalink / raw)
  To: Project Hail List

Hi, Jeff & Colin:

It looks like you broke something in CLD, not sure if server or client.
There are two possibly related bugs. But first, here's the messages
(The chunkd is run with -D). Note that I have 2 servers listed in DNS
(both on port 4499), but only one is up.

Feb  6 23:36:10 hitlain cld[1934]: databases up
Feb  6 23:36:10 hitlain cld[1934]: Listening on :: port 4499
Feb  6 23:36:10 hitlain cld[1934]: initialized: verbose 0
Feb  6 23:37:10 hitlain chunkd[1967]: Verbose debug output enabled
Feb  6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host hitlain.zaitcev.lan prio 10 weight 50
Feb  6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host elanor.zaitcev.lan prio 10 weight 50
Feb  6 23:37:10 hitlain chunkd[1968]: Selected CLD host hitlain.zaitcev.lan port 4499
Feb  6 23:37:10 hitlain chunkd[1968]: Listening on host :: port 8082
Feb  6 23:37:10 hitlain chunkd[1968]: initialized
Feb  6 23:37:10 hitlain chunkd[1968]: New CLD session created, sid 05B521BF4071EBA2
Feb  6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
Feb  6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" written
Feb  6 23:39:45 hitlain chunkd[1968]: Session failed, sid 05B521BF4071EBA2
Feb  6 23:39:45 hitlain chunkd[1968]: Selected CLD host elanor.zaitcev.lan port 4499
Feb  6 23:39:45 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:39:50 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:39:55 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:00 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:05 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:10 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:15 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:46 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:51 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:40:56 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:01 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:06 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:11 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:16 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb  6 23:41:46 hitlain chunkd[1968]: New CLD session creation failed: 17
Feb  6 23:41:46 hitlain chunkd[1968]: Session failed, sid 6C5A5E5D4D8F2270
Feb  6 23:41:46 hitlain chunkd[1968]: Selected CLD host hitlain.zaitcev.lan port 4499
Feb  6 23:41:46 hitlain chunkd[1968]: New CLD session created, sid 4E2A8ED73878F038
Feb  6 23:41:46 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
Feb  6 23:41:46 hitlain chunkd[1968]: CLD lock(/chunk-default/2) failed: 11

So, first regression: session ALWAYS fails, for no reason I can see.
It takes 2 minutes 35 seconds, as you can observe from the "Session failed"
message.

Second regression: locks of failed session are not removed (this is
what code 11 is). Once the original session fails, CLD client cannot
re-acquire the lock, ever, until the daemon is restarted.

This definitely used work before the XDR, and it only takes 3 minutes
to fail. Do you guys run and use chunkd or you just do "make check" and
consider it done? I thought we talked about having virtually permanent
cells and long-living CLD clients, because this sort of thing keeps
cropping up.

Cheers,
-- Pete

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Post-XDR CLD cannot keep session up
  2010-02-07  7:00 Post-XDR CLD cannot keep session up Pete Zaitcev
@ 2010-02-07 22:28 ` Jeff Garzik
  2010-02-09 10:34 ` Jeff Garzik
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff Garzik @ 2010-02-07 22:28 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Project Hail List

On 02/07/2010 02:00 AM, Pete Zaitcev wrote:
> Hi, Jeff&  Colin:
>
> It looks like you broke something in CLD, not sure if server or client.
> There are two possibly related bugs. But first, here's the messages
> (The chunkd is run with -D). Note that I have 2 servers listed in DNS
> (both on port 4499), but only one is up.
>
> Feb  6 23:36:10 hitlain cld[1934]: databases up
> Feb  6 23:36:10 hitlain cld[1934]: Listening on :: port 4499
> Feb  6 23:36:10 hitlain cld[1934]: initialized: verbose 0
> Feb  6 23:37:10 hitlain chunkd[1967]: Verbose debug output enabled
> Feb  6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host hitlain.zaitcev.lan prio 10 weight 50
> Feb  6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host elanor.zaitcev.lan prio 10 weight 50
> Feb  6 23:37:10 hitlain chunkd[1968]: Selected CLD host hitlain.zaitcev.lan port 4499
> Feb  6 23:37:10 hitlain chunkd[1968]: Listening on host :: port 8082
> Feb  6 23:37:10 hitlain chunkd[1968]: initialized
> Feb  6 23:37:10 hitlain chunkd[1968]: New CLD session created, sid 05B521BF4071EBA2
> Feb  6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
> Feb  6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" written
> Feb  6 23:39:45 hitlain chunkd[1968]: Session failed, sid 05B521BF4071EBA2
> Feb  6 23:39:45 hitlain chunkd[1968]: Selected CLD host elanor.zaitcev.lan port 4499
> Feb  6 23:39:45 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:39:50 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:39:55 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:00 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:05 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:10 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:15 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:46 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:51 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:56 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:01 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:06 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:11 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:16 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:46 hitlain chunkd[1968]: New CLD session creation failed: 17
> Feb  6 23:41:46 hitlain chunkd[1968]: Session failed, sid 6C5A5E5D4D8F2270
> Feb  6 23:41:46 hitlain chunkd[1968]: Selected CLD host hitlain.zaitcev.lan port 4499
> Feb  6 23:41:46 hitlain chunkd[1968]: New CLD session created, sid 4E2A8ED73878F038
> Feb  6 23:41:46 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
> Feb  6 23:41:46 hitlain chunkd[1968]: CLD lock(/chunk-default/2) failed: 11
>
> So, first regression: session ALWAYS fails, for no reason I can see.
> It takes 2 minutes 35 seconds, as you can observe from the "Session failed"
> message.
>
> Second regression: locks of failed session are not removed (this is
> what code 11 is). Once the original session fails, CLD client cannot
> re-acquire the lock, ever, until the daemon is restarted.

Thanks for the report.  That is definitely annoying...  I wonder if it 
is related to the ping_open bug I fixed...


> This definitely used work before the XDR, and it only takes 3 minutes
> to fail. Do you guys run and use chunkd or you just do "make check" and
> consider it done? I thought we talked about having virtually permanent
> cells and long-living CLD clients, because this sort of thing keeps
> cropping up.

My local one (shamefully not using SRV, like I should) is pretty 
outdated, back to the latest released tarballs, since I dislike having 
to lose data on upgrade ;-)

	Jeff




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Post-XDR CLD cannot keep session up
  2010-02-07  7:00 Post-XDR CLD cannot keep session up Pete Zaitcev
  2010-02-07 22:28 ` Jeff Garzik
@ 2010-02-09 10:34 ` Jeff Garzik
  2010-02-09 12:06   ` Jeff Garzik
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2010-02-09 10:34 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Project Hail List

On 02/07/2010 02:00 AM, Pete Zaitcev wrote:
> Hi, Jeff&  Colin:
>
> It looks like you broke something in CLD, not sure if server or client.
> There are two possibly related bugs. But first, here's the messages
> (The chunkd is run with -D). Note that I have 2 servers listed in DNS
> (both on port 4499), but only one is up.
>
> Feb  6 23:36:10 hitlain cld[1934]: databases up
> Feb  6 23:36:10 hitlain cld[1934]: Listening on :: port 4499
> Feb  6 23:36:10 hitlain cld[1934]: initialized: verbose 0
> Feb  6 23:37:10 hitlain chunkd[1967]: Verbose debug output enabled
> Feb  6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host hitlain.zaitcev.lan prio 10 weight 50
> Feb  6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host elanor.zaitcev.lan prio 10 weight 50
> Feb  6 23:37:10 hitlain chunkd[1968]: Selected CLD host hitlain.zaitcev.lan port 4499
> Feb  6 23:37:10 hitlain chunkd[1968]: Listening on host :: port 8082
> Feb  6 23:37:10 hitlain chunkd[1968]: initialized
> Feb  6 23:37:10 hitlain chunkd[1968]: New CLD session created, sid 05B521BF4071EBA2
> Feb  6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
> Feb  6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" written
> Feb  6 23:39:45 hitlain chunkd[1968]: Session failed, sid 05B521BF4071EBA2
> Feb  6 23:39:45 hitlain chunkd[1968]: Selected CLD host elanor.zaitcev.lan port 4499
> Feb  6 23:39:45 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:39:50 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:39:55 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:00 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:05 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:10 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:15 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:46 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:51 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:40:56 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:01 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:06 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:11 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:16 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
> Feb  6 23:41:46 hitlain chunkd[1968]: New CLD session creation failed: 17
> Feb  6 23:41:46 hitlain chunkd[1968]: Session failed, sid 6C5A5E5D4D8F2270
> Feb  6 23:41:46 hitlain chunkd[1968]: Selected CLD host hitlain.zaitcev.lan port 4499
> Feb  6 23:41:46 hitlain chunkd[1968]: New CLD session created, sid 4E2A8ED73878F038
> Feb  6 23:41:46 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
> Feb  6 23:41:46 hitlain chunkd[1968]: CLD lock(/chunk-default/2) failed: 11
>
> So, first regression: session ALWAYS fails, for no reason I can see.
> It takes 2 minutes 35 seconds, as you can observe from the "Session failed"
> message.


Well, session_timeout() is not being executed like it should be, by the 
core timer code.  This could be memory corruption, a libtimer bug, or 
something else entirely.  I can observe session_timeout() being updated 
to a new timer expiration, and then never being called again.

Off to run valgrind...

	Jeff



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Post-XDR CLD cannot keep session up
  2010-02-09 10:34 ` Jeff Garzik
@ 2010-02-09 12:06   ` Jeff Garzik
  2010-02-09 16:12     ` Pete Zaitcev
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2010-02-09 12:06 UTC (permalink / raw)
  To: Project Hail List; +Cc: Pete Zaitcev

On 02/09/2010 05:34 AM, Jeff Garzik wrote:
> On 02/07/2010 02:00 AM, Pete Zaitcev wrote:
>> Hi, Jeff& Colin:
>>
>> It looks like you broke something in CLD, not sure if server or client.
>> There are two possibly related bugs. But first, here's the messages
>> (The chunkd is run with -D). Note that I have 2 servers listed in DNS
>> (both on port 4499), but only one is up.
>>
>> Feb 6 23:36:10 hitlain cld[1934]: databases up
>> Feb 6 23:36:10 hitlain cld[1934]: Listening on :: port 4499
>> Feb 6 23:36:10 hitlain cld[1934]: initialized: verbose 0
>> Feb 6 23:37:10 hitlain chunkd[1967]: Verbose debug output enabled
>> Feb 6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host
>> hitlain.zaitcev.lan prio 10 weight 50
>> Feb 6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host
>> elanor.zaitcev.lan prio 10 weight 50
>> Feb 6 23:37:10 hitlain chunkd[1968]: Selected CLD host
>> hitlain.zaitcev.lan port 4499
>> Feb 6 23:37:10 hitlain chunkd[1968]: Listening on host :: port 8082
>> Feb 6 23:37:10 hitlain chunkd[1968]: initialized
>> Feb 6 23:37:10 hitlain chunkd[1968]: New CLD session created, sid
>> 05B521BF4071EBA2
>> Feb 6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
>> Feb 6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" written
>> Feb 6 23:39:45 hitlain chunkd[1968]: Session failed, sid 05B521BF4071EBA2
>> Feb 6 23:39:45 hitlain chunkd[1968]: Selected CLD host
>> elanor.zaitcev.lan port 4499
>> Feb 6 23:39:45 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:39:50 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:39:55 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:00 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:05 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:10 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:15 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:46 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:51 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:40:56 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:01 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:06 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:11 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:16 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
>> Feb 6 23:41:46 hitlain chunkd[1968]: New CLD session creation failed: 17
>> Feb 6 23:41:46 hitlain chunkd[1968]: Session failed, sid 6C5A5E5D4D8F2270
>> Feb 6 23:41:46 hitlain chunkd[1968]: Selected CLD host
>> hitlain.zaitcev.lan port 4499
>> Feb 6 23:41:46 hitlain chunkd[1968]: New CLD session created, sid
>> 4E2A8ED73878F038
>> Feb 6 23:41:46 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
>> Feb 6 23:41:46 hitlain chunkd[1968]: CLD lock(/chunk-default/2)
>> failed: 11
>>
>> So, first regression: session ALWAYS fails, for no reason I can see.
>> It takes 2 minutes 35 seconds, as you can observe from the "Session
>> failed"
>> message.
>
>
> Well, session_timeout() is not being executed like it should be, by the
> core timer code. This could be memory corruption, a libtimer bug, or
> something else entirely. I can observe session_timeout() being updated
> to a new timer expiration, and then never being called again.

There is definitely something strange going on in the timer routines, 
that is causing session_timeout() not to run even though it re-adds 
itself to the timer list using cld_timer_add().  fprintf() debug output 
in cld_timer_add and cld_timers_run are yielding unexpected results.

More debugging after sleep.

	Jeff



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Post-XDR CLD cannot keep session up
  2010-02-09 12:06   ` Jeff Garzik
@ 2010-02-09 16:12     ` Pete Zaitcev
  0 siblings, 0 replies; 5+ messages in thread
From: Pete Zaitcev @ 2010-02-09 16:12 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

On Tue, 09 Feb 2010 07:06:39 -0500
Jeff Garzik <jeff@garzik.org> wrote:

> There is definitely something strange going on in the timer routines, 
> that is causing session_timeout() not to run even though it re-adds 
> itself to the timer list using cld_timer_add().  fprintf() debug output 
> in cld_timer_add and cld_timers_run are yielding unexpected results.

Shoot, I think I know what this is, and it's my fault. The list is
"cached" improperly inside cld_timers_run. I remember that at some
point I added a mutex to every list and noticed that the list wasn't
locked correctly, so fixed it. But then I dropped those mutexes because
of some recursion issues and undone the fix. I'll retest and send a
patch in a few.

-- Pete

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-02-09 16:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-07  7:00 Post-XDR CLD cannot keep session up Pete Zaitcev
2010-02-07 22:28 ` Jeff Garzik
2010-02-09 10:34 ` Jeff Garzik
2010-02-09 12:06   ` Jeff Garzik
2010-02-09 16:12     ` Pete Zaitcev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox