From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [tabled patch] abstract out TCP-write code Date: Wed, 22 Sep 2010 21:26:13 -0400 Message-ID: <4C9AACB5.40403@garzik.org> References: <20100923000908.GA15908@havoc.gtf.org> <20100922182836.566df309@lembas.zaitcev.lan> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=5ESs2KQWIm79DakA6ripPkPH1V666z6fMZj+sndqZks=; b=WsEeH1TmZu8/zxWAWOOdzc/5cpIWh8Qd1Ja2+R3BJ+mX4c8KyO+Oz05vSRuZe9Ovo5 h2kvFbYQ9XKQ1dcdxwjWPM1caWP3WcdEIi+NFYhGWQoJqO9biUO205zMgRTyWZYHgXfa pTpUld/mxnWRiZ1XSkVtB0PshevsWHRAEqO34= In-Reply-To: <20100922182836.566df309@lembas.zaitcev.lan> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: hail-devel@vger.kernel.org On 09/22/2010 08:28 PM, Pete Zaitcev wrote: > On Wed, 22 Sep 2010 20:09:08 -0400 > Jeff Garzik wrote: > >> @@ -1410,7 +1224,7 @@ static void tcp_cli_event(int fd, short events, void *userdata) >> >> do { >> loop = cli->evt_table[cli->state](cli, events); >> - loop |= cli_write_run_compl(); >> + loop |= atcp_write_run_compl(&cli->wst); >> } while (loop); >> } > > This cannot be right. Please see commit > d1a45fca7908b7128ed4fe2ab611111f02ee938f: > > tabled: fix running completions over disposed cli > > Miracluously this never actually crashed on me, but I added unrelated > debugging printout into the dispatch routine and it printed weird > values. Then it dawned on me that a state change function may dispose > of the struct cli, in which case cli_write_run_compl is use-after-free. > > It may seem that checking if the old state was evt_dispose before > running cli_write_run_compl is an expedient fix, but that does not > work, because we do not always dispose of the cli in such case. > If the cli to be disposed still has anything in the queue, we > need to continue to deliver events, and for that we have to > run outstanding completions. > > So, we go a longer route and re-hook the list of completions > to a per-server global instead of a client. The patch is straight- > forward. The only thing we need to be careful is to make sure > that no outstanding completions are left in the queue before > freeing a client struct. This is ensured by force-running completions. > > One other necessary change was to add a back poiter from a completion > to the current client. This is because one caller needed the client > pointer (object_get_more). Thanks for the reminder. Looking at this change again, I don't see how this avoids use-after-free. If completions exist after state change function leads one to cli_evt_dispose() -> cli_free(), then cli_write_run_compl() still calls cli_write_free() with the stale 'cli' pointer. It seems like the real fix is to have the functions in the FSM loop return an additional piece of information, indicating that 'cli' is no longer valid. client_write's object lifetime should always be a subset of client's object lifetime. > Speaking of backpointers, I think it would be much cleaner > to get rid of two-argument format for callback. It stinks of > special-casing. Either throw a back pointer into the first word > of buf, or create some kind of object/struct passed as > argument to atcp_writeq(), that's what I would do. It is a common idiom even in GLib that callbacks receive two anonymous pointers; witness the data type GFunc's 'data' and 'user_data' arguments: http://library.gnome.org/devel/glib/stable/glib-Doubly-Linked-Lists.html#GFunc This is because typically one has two objects -- a parent and a child -- with different object lifetime. When considering both tabled and itd, you see example of this in the need for struct client (tabled) or struct session (itd) pointers on occasion, in cli_writeq calls. Jeff