From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Fri, 5 Jun 2020 12:14:57 -0400 (EDT) Subject: [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck In-Reply-To: References: <20200526130536.295081-1-rpeterso@redhat.com> <20200526130536.295081-7-rpeterso@redhat.com> <435435062.31970561.1591368532509.JavaMail.zimbra@redhat.com> Message-ID: <1003983013.31986358.1591373697538.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit ----- Original Message ----- > On Fri, Jun 5, 2020 at 4:49 PM Bob Peterson wrote: > > Hi Andreas, > > > > ----- Original Message ----- > > (snip) > > > > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct > > > > gfs2_glock *gl, u32 flags) > > > > > > > > if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) { > > > > if (!sdp->sd_log_idle) { > > > > + unsigned long start = jiffies; > > > > + > > > > for (;;) { > > > > + if (time_after(jiffies, start + (HZ * > > > > 600))) { > > > > > > This should probably have some rate limiting as well, for example: > > > > Seems unnecessary. If the log flush gets stuck, the message will be printed > > once, and at most every 10 minutes. > > No, after ten minutes, the message will actually be printed for each > iteration of the loop. That's exactly why I was suggesting the rate > limiting. No, after ten minutes it dumps the ail list so you can see the problem and exits the loop with "break;". The next time it enters the loop, it starts with a new value of start which doesn't expire for another ten minutes. Bob