From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jim Meyering Subject: Re: [tabled patch] abstract out TCP-write code Date: Thu, 23 Sep 2010 17:28:02 +0200 Message-ID: <87hbhgtr7h.fsf@meyering.net> References: <20100923000908.GA15908@havoc.gtf.org> <20100922182836.566df309@lembas.zaitcev.lan> <4C9AACB5.40403@garzik.org> <20100922203741.48a2b8e6@lembas.zaitcev.lan> <4C9AD849.8030404@garzik.org> <20100923075706.4952c527@lembas.zaitcev.lan> Mime-Version: 1.0 Return-path: In-Reply-To: <20100923075706.4952c527@lembas.zaitcev.lan> (Pete Zaitcev's message of "Thu, 23 Sep 2010 07:57:06 -0600") Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Pete Zaitcev Cc: Jeff Garzik , hail-devel@vger.kernel.org Pete Zaitcev wrote: > On Thu, 23 Sep 2010 00:32:09 -0400 > Jeff Garzik wrote: > >> >>> So, we go a longer route and re-hook the list of completions >> >>> to a per-server global instead of a client. The patch is straight- >> >>> forward. The only thing we need to be careful is to make sure >> >>> that no outstanding completions are left in the queue before >> >>> freeing a client struct. This is ensured by force-running completions. >> > >> >> Looking at this change again, I don't see how this avoids >> >> use-after-free. If completions exist after state change function leads >> >> one to cli_evt_dispose() -> cli_free(), then cli_write_run_compl() still >> >> calls cli_write_free() with the stale 'cli' pointer. >> > >> > We run completions before freeing in all cases. My patch was correct. >> >> Logically, if completions are run before freeing in all cases, there is >> no need to make write_compl_q global. That was a red herring, which by >> side effect avoided the bug with the stale 'cli' pointer. > > Side effect or not, if one applies your patch and executes > "export MALLOC_PERTURB_=43" command before "make check", > the result is a crash: > > Core was generated by `../server/tabled -C ../test/tabled-test.conf -E'. > Program terminated with signal 11, Segmentation fault. > #0 atcp_write_free (tmp=0x2b2b2b2b2b2b2afb, done=true) at atcp.c:49 > 49 struct atcp_wr_state *wst = tmp->wst; > (gdb) where > #0 atcp_write_free (tmp=0x2b2b2b2b2b2b2afb, done=true) at atcp.c:49 > #1 0x000000000040387e in atcp_write_run_compl (wst=0x15a9be8) at atcp.c:70 > #2 0x000000000040f20a in tcp_cli_event (fd=, events=2, > userdata=0x15a9af0) at server.c:1227 > #3 0x00007f9320aec774 in event_base_loop () from /usr/lib64/libevent-1.4.so.2 > #4 0x00000000004107a5 in main (argc=, > argv=) at server.c:2139 > (gdb) > > The existing code (with the commit that you criticized) produces no crash. > Granted MALLOC_PERTURB_ is like SLAB debug option -- is not the normal > operating environment -- but IMHO it is completely legitimate. Every developer should have MALLOC_PERTURB_=N (N in 1..255) set in his/her environment on glibc-based systems. Almost all the time. Maybe I should push harder to get it added to rawhide's /etc/profile. I proposed that a few months ago: use MALLOC_PERTURB_ ... or lose http://thread.gmane.org/gmane.linux.redhat.fedora.devel/132690