From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Fri, 1 Feb 2019 09:51:21 -0500 (EST) Subject: [Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free In-Reply-To: <9b9dd4a6-6529-98e9-b1a1-aab8426cfa86@citrix.com> References: <20190131105543.15421-1-ross.lagerwall@citrix.com> <20190131105543.15421-2-ross.lagerwall@citrix.com> <9b9dd4a6-6529-98e9-b1a1-aab8426cfa86@citrix.com> Message-ID: <1750009173.69290319.1549032681784.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Ross, ----- Original Message ----- > Do you have any suggestions for tracking down the root cause? One time, when I had a similar problem in rhel7, and couldn't use kernel tracing because there were millions of glocks involved. The trace was too huge and quickly swamped the biggest possible kernel trace buffer. So I ended up writing this ugly, hacky patch that's attached. Perhaps you can use it as a starting point. The idea is: every time there's a get or a put to a glock, it saves off a 1-byte identifier of what function did the get/put. It saved it in a new 64-byte field kept for each glock, which of course meant the slab became much bigger, but it was never meant to be shipped, right? Then, when the problem occurred, it would dump out the problematic glock, including the 64-byte get/put history value. Then I would go through it and identify the history of what went wrong. Since this is a fairly old (2015) patch that targets an old rhel7, it will obviously need a lot of updating to get it to work, but it might work better than the kernel tracing, depending on how many glocks are involved in your test. Regards, Bob Peterson Red Hat File Systems -------------- next part -------------- A non-text attachment was scrubbed... Name: get_put.patch Type: text/x-patch Size: 25569 bytes Desc: not available URL: