* Re: [RFC] DEPT(DEPendency Tracker) with DLM(Distributed Lock Manager)
[not found] <20250522052453.GA42746@system.software.com>
@ 2025-05-26 0:11 ` Alexander Aring
[not found] ` <20250522052806.GB42746@system.software.com>
1 sibling, 0 replies; 4+ messages in thread
From: Alexander Aring @ 2025-05-26 0:11 UTC (permalink / raw)
To: Byungchul Park; +Cc: kernel_team, linux-kernel, gfs2
Hi Byungchul,
On Thu, May 22, 2025 at 1:40 AM Byungchul Park <byungchul@sk.com> wrote:
>
> Hi Alexander,
>
> We briefly talked about dept with DLM in an external channel. However,
> it'd be great to discuss what to aim and how to make it in more detail,
> in this mailing list.
>
> It's worth noting that dept doesn't track dependencies beyond different
> contexts to avoid adding false dependencies by any chance, which means
> though dept checks the dependency sanity *globally*, when it comes to
> creating dependencies, it happens only within e.g. each single system
> call context, each single irq context, each worker context, and so on,
> with its unique context id assigned to each independent context.
>
> In order for dept to work on DLM, we need a way to assign a unique
> context id to each interesting context in DLM's point of view, and let
> dept know the id. Once making it done, I think dept can work on DLM
> perfectly.
>
> Thoughts or any concern?
I think the unique context would be the "lock resource". The lock
resource is a unique byte array and is the unique cluster wide lock
context. It is the parameter name of "dlm_lock()" [0].
We don't have a unique id for it, but I guess this can be somehow created.
The locking context in DLM is per node, we don't do everything
locally. It does have a locking protocol over the network and works
with lock masters. A lock master is a unique node in the network to be
chosen to do all lock operations on. To maintain/create a unique lock
id context is I think more difficult when using real networking and
several nodes in a network, although I also think that DEPT is not
capable of running in such a distributed environment right now.
However a RFC patch series [2] is pending to add support for
net-namespace functionality so everything can be "simulated" on one
Linux kernel instance with several net-namespaces as per node
separation. With such functionality DEPT can be used to find deadlocks
in DLM applications. A unique lock context id for the resource name
can be easily created as we know everything about the network in a
local environment.
This is just the proof of concept, how it works in a real distributed
system would be another question...
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c?h=v6.15-rc7#n3372
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c?h=v6.15-rc7#n694
[2] https://lore.kernel.org/gfs2/20240930201358.2638665-1-aahringo@redhat.com/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] DEPT(DEPendency Tracker) with DLM(Distributed Lock Manager)
[not found] ` <20250522052806.GB42746@system.software.com>
@ 2025-05-26 0:13 ` Alexander Aring
2025-05-28 12:00 ` Alexander Aring
0 siblings, 1 reply; 4+ messages in thread
From: Alexander Aring @ 2025-05-26 0:13 UTC (permalink / raw)
To: Byungchul Park; +Cc: kernel_team, linux-kernel, gfs2
Hi,
On Thu, May 22, 2025 at 1:28 AM Byungchul Park <byungchul@sk.com> wrote:
>
> On Thu, May 22, 2025 at 02:24:53PM +0900, Byungchul Park wrote:
> > Hi Alexander,
> >
> > We briefly talked about dept with DLM in an external channel. However,
> > it'd be great to discuss what to aim and how to make it in more detail,
> > in this mailing list.
> >
> > It's worth noting that dept doesn't track dependencies beyond different
> > contexts to avoid adding false dependencies by any chance, which means
> > though dept checks the dependency sanity *globally*, when it comes to
> > creating dependencies, it happens only within e.g. each single system
> > call context, each single irq context, each worker context, and so on,
> > with its unique context id assigned to each independent context.
> >
> > In order for dept to work on DLM, we need a way to assign a unique
> > context id to each interesting context in DLM's point of view, and let
> > dept know the id. Once making it done, I think dept can work on DLM
> > perfectly.
>
> Plus, we need a way to share the global dependency graph used by dept
> between nodes too.
>
Having everything simulated and having nodes separated as
net-namespaces in one Linux kernel instance is I think at first
simpler to do and will show the "proof of concepts".
Sharing data between nodes is then just some memory area that is not
separated by per "struct net" context.
- Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] DEPT(DEPendency Tracker) with DLM(Distributed Lock Manager)
2025-05-26 0:13 ` Alexander Aring
@ 2025-05-28 12:00 ` Alexander Aring
2025-05-29 7:22 ` Byungchul Park
0 siblings, 1 reply; 4+ messages in thread
From: Alexander Aring @ 2025-05-28 12:00 UTC (permalink / raw)
To: Byungchul Park; +Cc: kernel_team, linux-kernel, gfs2
Hi,
On Sun, May 25, 2025 at 8:13 PM Alexander Aring <aahringo@redhat.com> wrote:
>
> Hi,
>
> On Thu, May 22, 2025 at 1:28 AM Byungchul Park <byungchul@sk.com> wrote:
> >
> > On Thu, May 22, 2025 at 02:24:53PM +0900, Byungchul Park wrote:
> > > Hi Alexander,
> > >
> > > We briefly talked about dept with DLM in an external channel. However,
> > > it'd be great to discuss what to aim and how to make it in more detail,
> > > in this mailing list.
> > >
> > > It's worth noting that dept doesn't track dependencies beyond different
> > > contexts to avoid adding false dependencies by any chance, which means
> > > though dept checks the dependency sanity *globally*, when it comes to
> > > creating dependencies, it happens only within e.g. each single system
> > > call context, each single irq context, each worker context, and so on,
> > > with its unique context id assigned to each independent context.
> > >
> > > In order for dept to work on DLM, we need a way to assign a unique
> > > context id to each interesting context in DLM's point of view, and let
> > > dept know the id. Once making it done, I think dept can work on DLM
> > > perfectly.
> >
> > Plus, we need a way to share the global dependency graph used by dept
> > between nodes too.
> >
>
> Having everything simulated and having nodes separated as
> net-namespaces in one Linux kernel instance is I think at first
> simpler to do and will show the "proof of concepts".
> Sharing data between nodes is then just some memory area that is not
> separated by per "struct net" context.
Alternatively the master node of the lock (this node knows everything
about the lock operations being done including the nodes that are
waiting to get the lock granted) can be used to detect cycles, we
already do that for some simple cases when converting locks directly
[0]. Maybe this is already enough to have all the information, but it
is not just a "wait_event()" mechanism, there needs to be some other
API to use DEPT for this case?
- Alex
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c?h=v6.15#n2163
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] DEPT(DEPendency Tracker) with DLM(Distributed Lock Manager)
2025-05-28 12:00 ` Alexander Aring
@ 2025-05-29 7:22 ` Byungchul Park
0 siblings, 0 replies; 4+ messages in thread
From: Byungchul Park @ 2025-05-29 7:22 UTC (permalink / raw)
To: Alexander Aring; +Cc: kernel_team, linux-kernel, gfs2
On Wed, May 28, 2025 at 08:00:02AM -0400, Alexander Aring wrote:
> Hi,
>
> On Sun, May 25, 2025 at 8:13 PM Alexander Aring <aahringo@redhat.com> wrote:
> >
> > Hi,
> >
> > On Thu, May 22, 2025 at 1:28 AM Byungchul Park <byungchul@sk.com> wrote:
> > >
> > > On Thu, May 22, 2025 at 02:24:53PM +0900, Byungchul Park wrote:
> > > > Hi Alexander,
> > > >
> > > > We briefly talked about dept with DLM in an external channel. However,
> > > > it'd be great to discuss what to aim and how to make it in more detail,
> > > > in this mailing list.
> > > >
> > > > It's worth noting that dept doesn't track dependencies beyond different
> > > > contexts to avoid adding false dependencies by any chance, which means
> > > > though dept checks the dependency sanity *globally*, when it comes to
> > > > creating dependencies, it happens only within e.g. each single system
> > > > call context, each single irq context, each worker context, and so on,
> > > > with its unique context id assigned to each independent context.
> > > >
> > > > In order for dept to work on DLM, we need a way to assign a unique
> > > > context id to each interesting context in DLM's point of view, and let
> > > > dept know the id. Once making it done, I think dept can work on DLM
> > > > perfectly.
> > >
> > > Plus, we need a way to share the global dependency graph used by dept
> > > between nodes too.
> > >
> >
> > Having everything simulated and having nodes separated as
> > net-namespaces in one Linux kernel instance is I think at first
> > simpler to do and will show the "proof of concepts".
> > Sharing data between nodes is then just some memory area that is not
> > separated by per "struct net" context.
>
> Alternatively the master node of the lock (this node knows everything
> about the lock operations being done including the nodes that are
> waiting to get the lock granted) can be used to detect cycles, we
Sounds good.
> already do that for some simple cases when converting locks directly
> [0]. Maybe this is already enough to have all the information, but it
It seems that DLM already tries to detect a deadlock. Can you provide
an example scenario where the current detection logic doesn't work?
It'd help me define what to do for better DLM and dept.
> is not just a "wait_event()" mechanism, there needs to be some other
> API to use DEPT for this case?
It'd be required to modify dept to work with isolated context ids - each
id corresponding to each node, not simple kernel contexts e.g. system
call or irq context. Which is not that hard to implement I think.
Answering to your question, We might need to add a few dept annotations.
Even though I'm afraid I don't understand how DLM works enough, for
example:
1. when recieving a lock(L1) request from a node(N1), that might wait,
annotate dept_wait(L1, events that can wake up N1) for N1 context,
where events are all the events that can wake up N1 from waiting.
2. when recieving a lock(L1) request from a node(N1), that does not
involve wait but just tries,
no need to annotate.
3. when the request is granted,
annotate dept_request_event(L1) which means there might be waiters
for L1 to be released from now on.
4. when releasing the lock(L1) no matter who releases the lock - the
releaser doesn't have to be N1,
annotate dept_event(L1) for the releaser.
Roughly, these annotations are needed, but again, it'd be helpful if you
provide an example scenario where the current detection logic you have
doesn't work, for better discussion.
Byungchul
>
> - Alex
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c?h=v6.15#n2163
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-29 7:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250522052453.GA42746@system.software.com>
2025-05-26 0:11 ` [RFC] DEPT(DEPendency Tracker) with DLM(Distributed Lock Manager) Alexander Aring
[not found] ` <20250522052806.GB42746@system.software.com>
2025-05-26 0:13 ` Alexander Aring
2025-05-28 12:00 ` Alexander Aring
2025-05-29 7:22 ` Byungchul Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox