* [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified [not found] <38741039.1176852.1412183985958.JavaMail.zimbra@redhat.com> @ 2014-10-01 17:21 ` Bob Peterson 2014-10-01 18:04 ` Steven Whitehouse 2014-10-01 18:42 ` David Teigland 0 siblings, 2 replies; 5+ messages in thread From: Bob Peterson @ 2014-10-01 17:21 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, This patch adds a new lock flag, DLM_LKF_NOLOOKUP, which instructs DLM to refrain from sending lookup requests in cases where the lock library node is not the current node. This is similar to the DLM_LKF_NOQUEUE flag, except it fails locks that would require a lookup, with -EAGAIN. This is not just about saving a network operation. It allows callers like GFS2 to master locks for which they are the directory node. Each node can then "prefer" local locks, especially in the case of GFS2 selecting resource groups for block allocations (implemented with a separate patch). This mastering of local locks distributes the locks between the nodes (at least until nodes enter or leave the cluster), which tends to make each node "keep to itself" when doing allocations. Thus, dlm communications are kept to a minimum, which results in significantly faster block allocations. Regards, Bob Peterson Red Hat File Systems Signed-off-by: Bob Peterson <rpeterso@redhat.com> --- fs/dlm/lock.c | 16 ++++++++++++++-- include/uapi/linux/dlmconstants.h | 7 +++++++ 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 83f3d55..f1e5b04 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -222,6 +222,11 @@ static inline int can_be_queued(struct dlm_lkb *lkb) return !(lkb->lkb_exflags & DLM_LKF_NOQUEUE); } +static inline int can_be_looked_up(struct dlm_lkb *lkb) +{ + return !(lkb->lkb_exflags & DLM_LKF_NOLOOKUP); +} + static inline int force_blocking_asts(struct dlm_lkb *lkb) { return (lkb->lkb_exflags & DLM_LKF_NOQUEUEBAST); @@ -2745,6 +2750,11 @@ static int set_master(struct dlm_rsb *r, struct dlm_lkb *lkb) return 0; } + if (!can_be_looked_up(lkb)) { + queue_cast(r, lkb, -EAGAIN); + return -EAGAIN; + } + wait_pending_remove(r); r->res_first_lkid = lkb->lkb_id; @@ -2828,7 +2838,8 @@ static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, if (flags & DLM_LKF_CONVDEADLK && !(flags & DLM_LKF_CONVERT)) goto out; - if (flags & DLM_LKF_CONVDEADLK && flags & DLM_LKF_NOQUEUE) + if (flags & DLM_LKF_CONVDEADLK && (flags & (DLM_LKF_NOQUEUE | + DLM_LKF_NOLOOKUP))) goto out; if (flags & DLM_LKF_EXPEDITE && flags & DLM_LKF_CONVERT) @@ -2837,7 +2848,8 @@ static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, if (flags & DLM_LKF_EXPEDITE && flags & DLM_LKF_QUECVT) goto out; - if (flags & DLM_LKF_EXPEDITE && flags & DLM_LKF_NOQUEUE) + if (flags & DLM_LKF_EXPEDITE && (flags & (DLM_LKF_NOQUEUE | + DLM_LKF_NOLOOKUP))) goto out; if (flags & DLM_LKF_EXPEDITE && mode != DLM_LOCK_NL) diff --git a/include/uapi/linux/dlmconstants.h b/include/uapi/linux/dlmconstants.h index 47bf08d..4b9ba15 100644 --- a/include/uapi/linux/dlmconstants.h +++ b/include/uapi/linux/dlmconstants.h @@ -131,6 +131,12 @@ * Unlock the lock even if it is converting or waiting or has sublocks. * Only really for use by the userland device.c code. * + * DLM_LKF_NOLOOKUP + * + * Don't take any network time/bandwidth to do directory owner lookups. + * This is a lock for which we only care whether it's completely under + * local jurisdiction. + * */ #define DLM_LKF_NOQUEUE 0x00000001 @@ -152,6 +158,7 @@ #define DLM_LKF_ALTCW 0x00010000 #define DLM_LKF_FORCEUNLOCK 0x00020000 #define DLM_LKF_TIMEOUT 0x00040000 +#define DLM_LKF_NOLOOKUP 0x00080000 /* * Some return codes that are not in errno.h ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified 2014-10-01 17:21 ` [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified Bob Peterson @ 2014-10-01 18:04 ` Steven Whitehouse 2014-10-03 17:24 ` Bob Peterson 2014-10-01 18:42 ` David Teigland 1 sibling, 1 reply; 5+ messages in thread From: Steven Whitehouse @ 2014-10-01 18:04 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 01/10/14 18:21, Bob Peterson wrote: > Hi, > > This patch adds a new lock flag, DLM_LKF_NOLOOKUP, which instructs DLM > to refrain from sending lookup requests in cases where the lock library > node is not the current node. This is similar to the DLM_LKF_NOQUEUE > flag, except it fails locks that would require a lookup, with -EAGAIN. > > This is not just about saving a network operation. It allows callers > like GFS2 to master locks for which they are the directory node. Each > node can then "prefer" local locks, especially in the case of GFS2 > selecting resource groups for block allocations (implemented with a > separate patch). This mastering of local locks distributes the locks > between the nodes (at least until nodes enter or leave the cluster), > which tends to make each node "keep to itself" when doing allocations. > Thus, dlm communications are kept to a minimum, which results in > significantly faster block allocations. I think we need to do some more investigation here... how long do the lookups take? If the issue is just to create a list of perferred rgrps for each node, then there are various ways in which we might do that. That is not to say that this isn't a good way to do it, but I think we should try to understand the timings here first and make sure that we are solving the right problem, Steve. > Regards, > > Bob Peterson > Red Hat File Systems > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > --- > fs/dlm/lock.c | 16 ++++++++++++++-- > include/uapi/linux/dlmconstants.h | 7 +++++++ > 2 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c > index 83f3d55..f1e5b04 100644 > --- a/fs/dlm/lock.c > +++ b/fs/dlm/lock.c > @@ -222,6 +222,11 @@ static inline int can_be_queued(struct dlm_lkb *lkb) > return !(lkb->lkb_exflags & DLM_LKF_NOQUEUE); > } > > +static inline int can_be_looked_up(struct dlm_lkb *lkb) > +{ > + return !(lkb->lkb_exflags & DLM_LKF_NOLOOKUP); > +} > + > static inline int force_blocking_asts(struct dlm_lkb *lkb) > { > return (lkb->lkb_exflags & DLM_LKF_NOQUEUEBAST); > @@ -2745,6 +2750,11 @@ static int set_master(struct dlm_rsb *r, struct dlm_lkb *lkb) > return 0; > } > > + if (!can_be_looked_up(lkb)) { > + queue_cast(r, lkb, -EAGAIN); > + return -EAGAIN; > + } > + > wait_pending_remove(r); > > r->res_first_lkid = lkb->lkb_id; > @@ -2828,7 +2838,8 @@ static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, > if (flags & DLM_LKF_CONVDEADLK && !(flags & DLM_LKF_CONVERT)) > goto out; > > - if (flags & DLM_LKF_CONVDEADLK && flags & DLM_LKF_NOQUEUE) > + if (flags & DLM_LKF_CONVDEADLK && (flags & (DLM_LKF_NOQUEUE | > + DLM_LKF_NOLOOKUP))) > goto out; > > if (flags & DLM_LKF_EXPEDITE && flags & DLM_LKF_CONVERT) > @@ -2837,7 +2848,8 @@ static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, > if (flags & DLM_LKF_EXPEDITE && flags & DLM_LKF_QUECVT) > goto out; > > - if (flags & DLM_LKF_EXPEDITE && flags & DLM_LKF_NOQUEUE) > + if (flags & DLM_LKF_EXPEDITE && (flags & (DLM_LKF_NOQUEUE | > + DLM_LKF_NOLOOKUP))) > goto out; > > if (flags & DLM_LKF_EXPEDITE && mode != DLM_LOCK_NL) > diff --git a/include/uapi/linux/dlmconstants.h b/include/uapi/linux/dlmconstants.h > index 47bf08d..4b9ba15 100644 > --- a/include/uapi/linux/dlmconstants.h > +++ b/include/uapi/linux/dlmconstants.h > @@ -131,6 +131,12 @@ > * Unlock the lock even if it is converting or waiting or has sublocks. > * Only really for use by the userland device.c code. > * > + * DLM_LKF_NOLOOKUP > + * > + * Don't take any network time/bandwidth to do directory owner lookups. > + * This is a lock for which we only care whether it's completely under > + * local jurisdiction. > + * > */ > > #define DLM_LKF_NOQUEUE 0x00000001 > @@ -152,6 +158,7 @@ > #define DLM_LKF_ALTCW 0x00010000 > #define DLM_LKF_FORCEUNLOCK 0x00020000 > #define DLM_LKF_TIMEOUT 0x00040000 > +#define DLM_LKF_NOLOOKUP 0x00080000 > > /* > * Some return codes that are not in errno.h > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified 2014-10-01 18:04 ` Steven Whitehouse @ 2014-10-03 17:24 ` Bob Peterson 0 siblings, 0 replies; 5+ messages in thread From: Bob Peterson @ 2014-10-03 17:24 UTC (permalink / raw) To: cluster-devel.redhat.com ----- Original Message ----- > Hi, > > I think we need to do some more investigation here... how long do the > lookups take? If the issue is just to create a list of perferred rgrps > for each node, then there are various ways in which we might do that. > That is not to say that this isn't a good way to do it, but I think we > should try to understand the timings here first and make sure that we > are solving the right problem, > > Steve. Hi, Contrary to my previous findings (which I think I may have screwed up), I've done further investigation and found that the DLM lookups aren't the issue at all. Also, making the DLM master the lookup node doesn't seem to make much difference either. So I'm scrapping this dlm patch. My latest set of patches now includes one that evenly distributes a preferred set of rgrps to each node. Unlike Dave's original algorithm in GFS1, this is a round-robin scheme that makes every node "prefer" every Nth rgrp, where N is the number of nodes. This seems to be just as fast if not faster than messing around with DLM. Hopefully I'll be posting more patches in the near future. I'm currently running some more tests regarding minimum reservations, and I'll post patches depending on those results. Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified 2014-10-01 17:21 ` [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified Bob Peterson 2014-10-01 18:04 ` Steven Whitehouse @ 2014-10-01 18:42 ` David Teigland 2014-10-03 17:28 ` Bob Peterson 1 sibling, 1 reply; 5+ messages in thread From: David Teigland @ 2014-10-01 18:42 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, Oct 01, 2014 at 01:21:41PM -0400, Bob Peterson wrote: > Hi, > > This patch adds a new lock flag, DLM_LKF_NOLOOKUP, which instructs DLM > to refrain from sending lookup requests in cases where the lock library > node is not the current node. This is similar to the DLM_LKF_NOQUEUE > flag, except it fails locks that would require a lookup, with -EAGAIN. Can we just use NOQUEUE? It tells you that there's a lock conflict, which tells you to move along and try another if you don't want to contend. If you cache acquired locks and reuse them, then it doesn't matter if the master node is remote or local. If lookups are a problem in general, there is the "nodir" lockspace mode, which replaces the resource directory lookup system with a static mapping of resources to master nodes. > This is not just about saving a network operation. It allows callers > like GFS2 to master locks for which they are the directory node. Each > node can then "prefer" local locks, especially in the case of GFS2 > selecting resource groups for block allocations (implemented with a > separate patch). This mastering of local locks distributes the locks > between the nodes (at least until nodes enter or leave the cluster), > which tends to make each node "keep to itself" when doing allocations. > Thus, dlm communications are kept to a minimum, which results in > significantly faster block allocations. Back in 2002 I solved what sounds like the same problem in gfs(1). It allowed all nodes to allocate blocks independent of each other, without constant locking. You can see the solution here: https://git.fedorahosted.org/cgit/cluster.git/tree/gfs-kernel/src/gfs/rgrp.c?h=RHEL4 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified 2014-10-01 18:42 ` David Teigland @ 2014-10-03 17:28 ` Bob Peterson 0 siblings, 0 replies; 5+ messages in thread From: Bob Peterson @ 2014-10-03 17:28 UTC (permalink / raw) To: cluster-devel.redhat.com ----- Original Message ----- > Can we just use NOQUEUE? It tells you that there's a lock conflict, which > tells you to move along and try another if you don't want to contend. If > you cache acquired locks and reuse them, then it doesn't matter if the > master node is remote or local. The problem with NOQUEUE is that it seems to depend on circumstances. If each node mounts the file system at a separate time, and as part of that process, they do a NOQUEUE lock on every rgrp, they all are granted the lock. With this scheme, they're evenly divided between the nodes. It doesn't matter anyway, because I'm scrapping the DLM patch in favor of a scheme like the one you pointed out in GFS1 below. See my other email for more details. > If lookups are a problem in general, there is the "nodir" lockspace mode, > which replaces the resource directory lookup system with a static mapping > of resources to master nodes. > (snip) > Back in 2002 I solved what sounds like the same problem in gfs(1). It > allowed all nodes to allocate blocks independent of each other, without > constant locking. You can see the solution here: > > https://git.fedorahosted.org/cgit/cluster.git/tree/gfs-kernel/src/gfs/rgrp.c?h=RHEL4 Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-10-03 17:28 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <38741039.1176852.1412183985958.JavaMail.zimbra@redhat.com> 2014-10-01 17:21 ` [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified Bob Peterson 2014-10-01 18:04 ` Steven Whitehouse 2014-10-03 17:24 ` Bob Peterson 2014-10-01 18:42 ` David Teigland 2014-10-03 17:28 ` Bob Peterson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).