From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Wed, 28 Jul 2021 08:30:06 -0500 Subject: [Cluster-devel] [GFS2 PATCH 09/15] gfs2: fix deadlock in gfs2_ail1_empty withdraw In-Reply-To: References: <20210727173709.210711-1-rpeterso@redhat.com> <20210727173709.210711-10-rpeterso@redhat.com> Message-ID: <097fc264-fd96-6bc4-eff4-e56fb9ea58ad@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On 7/28/21 12:38 AM, Andreas Gruenbacher wrote: > Hi Bob, > > On Tue, Jul 27, 2021 at 7:37 PM Bob Peterson wrote: >> Before this patch, function gfs2_ail1_empty could issue a file system >> withdraw when IO errors were discovered. However, there are several >> callers, including gfs2_flush_revokes() which holds the gfs2_log_lock >> before calling gfs2_ail1_empty. If gfs2_ail1_empty needed to withdraw >> it would leave the gfs2_log_lock held, which resulted in a deadlock >> due to other processes that needed the log_lock. >> >> Another problem discovered by Christoph Helwig is that we cannot >> withdraw from the log_flush process because it may be called from >> the glock workqueue, and the withdraw process waits for that very >> workqueue to be flushed. So the withdraw must be ignored until it may >> be handled by a more appropriate context like the gfs2_logd daemon. >> >> This patch moves the withdraw out of function gfs2_ail1_empty and >> makes each of the callers check for a withdraw by calling new function >> check_ail1_withdraw. > >> Function gfs2_flush_revokes now does this check >> after releasing the gfs2_log_lock to avoid the deadlock. > > I don't see that in the code. Yeah, the comment was wrong. I noticed the problem and already removed the paragraph after the patch set was sent out. Bob