From: Jeff Garzik <jeff@garzik.org>
To: Pete Zaitcev <zaitcev@redhat.com>
Cc: Project Hail List <hail-devel@vger.kernel.org>
Subject: Re: [Patch 01/12] CLD: fix crash in retransmissions
Date: Sun, 18 Apr 2010 23:46:07 -0400 [thread overview]
Message-ID: <4BCBD1FF.3080909@garzik.org> (raw)
In-Reply-To: <20100417223709.3f902df3@redhat.com>
On 04/18/2010 12:37 AM, Pete Zaitcev wrote:
> For a longest time I was plagued by (very infrequent) crashes like this:
>
> Program received signal SIGSEGV, Segmentation fault.
> sess_retry_output (timer=0x92070c0) at session.c:532
> 532 if (!next_retry || (op->next_retry< next_retry))
> (gdb) info threads
> * 1 Thread 0xb72f96c0 (LWP 22417) sess_retry_output (timer=0x92070c0) at session.c:532
> (gdb) where
> #0 sess_retry_output (timer=0x92070c0) at session.c:532
> #1 session_retry (timer=0x92070c0) at session.c:565
> #2 0x08049aee in cld_timers_run (tlist=0x8056630) at ../lib/libtimer.c:95
> #3 0x0804e9cc in main_loop (argc=5, argv=0xbff70bd4) at server.c:983
> #4 main (argc=5, argv=0xbff70bd4) at server.c:1138
>
> The crash happens because op is NULL. As it turned out, this happens
> if a packet retransmit and a session expiration occur simultaneously
> (in the same pass of timers_run). The scenario is:
> - timers_run collects expired timers at exec list
> - timers_run expires session
> - two timer_del are called, but one of them is on exec list already,
> so it's ineffective
> - session is freed, this zeroes ->data in lists (later op)
> - timers_run continues along the exec list, invokes the retransmission
> callback, and that crashes with NULL op.
>
> The proposed solution is to rework the timers_run, again. But this
> time, we'll make it simpler by observing that timers are ordered by
> expiration time. Therefore, we can pull next timer off the list,
> expire it, and loop until expiration time is greater than the current
> time. No execution list is kept. The integrity of the main list
> is assured by never walking it and always referring to the head
> anew at each iteration.
>
> This patch appears to fix the problem and stands up to use that
> crashed the old code.
>
> Signed-off-by: Pete Zaitcev<zaitcev@redhat.com>
>
> ---
> include/cld_common.h | 10 ++++++----
> lib/libtimer.c | 41 ++++++++++++++++++++---------------------
> 2 files changed, 26 insertions(+), 25 deletions(-)
applied 1-4
Note that I change the email subject line prefix (which is normally
copied by automated tools directly into git) from "CLD" to "libcldc",
when committing to git. I am trying to follow (and encourage others to)
the kernel's method of using a prefix to indicate the subsystem or
section within the current git repo to which a change applies.
To be fully friendly with automated tools, an ideal Project Hail subject
line might read
[cld patch 1/1] update Makefile.am
or
[tabled patch 1/1] update Makefile.am
or
[cld patch 1/1] libcldc: add nncld API, the new new CLD API
Not a big deal, just noting what is most friendly to the git automated
tools, when I'm importing each Hail patch.
Jeff
next prev parent reply other threads:[~2010-04-19 3:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-18 4:37 [Patch 01/12] CLD: fix crash in retransmissions Pete Zaitcev
2010-04-19 3:46 ` Jeff Garzik [this message]
2010-04-19 4:38 ` Pete Zaitcev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BCBD1FF.3080909@garzik.org \
--to=jeff@garzik.org \
--cc=hail-devel@vger.kernel.org \
--cc=zaitcev@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox