From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Fri, 28 Sep 2018 08:23:58 -0400 (EDT) Subject: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements In-Reply-To: References: <1537455133-48589-1-git-send-email-mark.syms@citrix.com> <77971123.14918571.1537463869960.JavaMail.zimbra@redhat.com> Message-ID: <552791371.16890843.1538137438323.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit ----- Original Message ----- > Thanks for that Bob, we've been watching with interest the changes going in > upstream but at the moment we're not really in a position to take advantage > of them. > > Due to hardware vendor support certification requirements XenServer can only > very occasionally make big kernel bumps that would affect the ABI that the > driver would see as that would require our hardware partners to recertify. > So, we're currently on a 4.4.52 base but the gfs2 driver is somewhat newer > as it is essentially self-contained and therefore we can backport change > more easily. We currently have most of the GFS2 and DLM changes that are in > 4.15 backported into the XenServer 7.6 kernel, but we can't take the ones > related to iomap as they are more invasive and it looks like a number of the > more recent performance targeting changes are also predicated on the iomap > framework. > > As I mentioned in the covering letter, the intra host problem would largely > be a non-issue if EX glocks were actually a host wide thing with local > mutexes used to share them within the host. I don't know if this is what > your patch set is trying to achieve or not. It's not so much that that > selection of resource group is "random", just that there is a random chance > that we won't select the first RG that we test, it probably does work out > much the same though. > > The inter host problem addressed by the second patch seems to be less > amenable to avoidance as the hosts don't seem to have a synchronous view of > the state of the resource group locks (for understandable reasons as I'd > expect thisto be very expensive to keep sync'd). So it seemed reasonable to > try to make it "expensive" to request a resource that someone else is using > and also to avoid immediately grabbing it back if we've been asked to > relinquish it. It does seem to give a fairer balance to the usage without > being massively invasive. > > We thought we should share these with the community anyway even if they only > serve as inspiration for more detailed changes and also to describe the > scenarios where we're seeing issues now that we have completed implementing > the XenServer support for GFS2 that we discussed back in Nuremburg last > year. In our testing they certainly make things better. They probably aren?t > fully optimal as we can't maintain 10g wire speed consistently across the > full LUN but we're getting about 75% which is certainly better than we were > seeing before we started looking at this. > > Thanks, > > Mark. Hi Mark, I'm really curious if you guys tried the two patches I posted here from 17 January 2018 in place of the two patches you posted. We see much better throughput with those over stock. I know Steve wants a different solution, and in the long run it will be a better one, but I've been trying to convince him we should use them as a stop-gap measure to mitigate this problem until we get a more proper solution in place (which is obviously taking some time, due to unforeseen circumstances). Regards, Bob Peterson