From: Byungchul Park <byungchul.park@lge.com>
To: Jan Kara <jack@suse.cz>
Cc: torvalds@linux-foundation.org, damien.lemoal@opensource.wdc.com,
linux-ide@vger.kernel.org, adilger.kernel@dilger.ca,
linux-ext4@vger.kernel.org, mingo@redhat.com,
linux-kernel@vger.kernel.org, peterz@infradead.org,
will@kernel.org, tglx@linutronix.de, rostedt@goodmis.org,
joel@joelfernandes.org, sashal@kernel.org,
daniel.vetter@ffwll.ch, chris@chris-wilson.co.uk,
duyuyang@gmail.com, johannes.berg@intel.com, tj@kernel.org,
tytso@mit.edu, willy@infradead.org, david@fromorbit.com,
amir73il@gmail.com, bfields@fieldses.org,
gregkh@linuxfoundation.org, kernel-team@lge.com,
linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@kernel.org,
minchan@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com,
sj@kernel.org, jglisse@redhat.com, dennis@kernel.org,
cl@linux.com, penberg@kernel.org, rientjes@google.com,
vbabka@suse.cz, ngupta@vflare.org, linux-block@vger.kernel.org,
axboe@kernel.dk, paolo.valente@linaro.org, josef@toxicpanda.com,
linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
jack@suse.com, jlayton@kernel.org, dan.j.williams@intel.com,
hch@infradead.org, djwong@kernel.org,
dri-devel@lists.freedesktop.org, airlied@linux.ie,
rodrigosiqueiramelo@gmail.com, melissa.srw@gmail.com,
hamohammed.sa@gmail.com
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1
Date: Fri, 4 Mar 2022 10:56:51 +0900 [thread overview]
Message-ID: <20220304015650.GC6112@X58A-UD3R> (raw)
In-Reply-To: <20220303095456.kym32pxshwryescx@quack3.lan>
On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote:
> On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> >
> > consumer producer
> >
> > lock L
> > (too much work queued == true)
> > unlock L
> > --- preempted
> > lock L
> > unlock L
> > do work
> > lock L
> > unlock L
> > do work
> > ...
> > (no work == true)
> > sleep
> > --- scheduled in
> > sleep
> >
> > This code leads a deadlock without another wakeup source, say, not safe.
>
> So the scenario you describe above is indeed possible. But the trick is
> that the wakeup from 'consumer' as is doing work will remove 'producer'
> from the wait queue and change the 'producer' process state to
> 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
> scheduler will just treat this as another preemption point and the
> 'producer' will immediately or soon continue to run. So indeed we can think
> of this as "another wakeup source" but the source is in the CPU scheduler
> itself. This is the standard way how waitqueues are used in the kernel...
Nice! Thanks for the explanation. I will take it into account if needed.
> > Lastly, just for your information, I need to explain how Dept works a
> > little more for you not to misunderstand Dept.
> >
> > Assuming the consumer and producer guarantee not to lead a deadlock like
> > the following, Dept won't report it a problem:
> >
> > consumer producer
> >
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > ... ...
> >
> > Dept does not consider all waits preceeding an event but only waits that
> > might lead a deadlock. In this case, Dept works with each region
> > independently.
> >
> > consumer producer
> >
> > sleep <- initiates region 1
> > --- region 1 starts
> > ... ...
> > --- region 1 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 2
> > --- region 2 starts
> > ... ...
> > --- region 2 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> > sleep <- initiates region 3
> > --- region 3 starts
> > ... ...
> > --- region 3 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 4
> > --- region 4 starts
> > ... ...
> > --- region 4 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> >
> > That is, Dept does not build dependencies across different regions. So
> > you don't have to worry about unreasonable false positives that much.
> >
> > Thoughts?
>
> Thanks for explanation! And what exactly defines the 'regions'? When some
> process goes to sleep on some waitqueue, this defines a start of a region
> at the place where all the other processes are at that moment and wakeup of
> the waitqueue is an end of the region?
Yes. Let me explain it more for better understanding.
(I copied it from the talk I did with Matthew..)
ideal view
-----------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X [S]
--- ideally region 1 starts here
wait for the event ...
sleep if (can see REQUESTEVENT written)
it's on the way to the event
...
...
--- ideally region 1 ends here
finally the event [E]
Dept basically works with the above view with regard to wait and event.
But it's very hard to identify the ideal [S] point in practice. So Dept
instead identifies [S] point by checking WAITSTART with memory barriers
like the following, which would make Dept work conservatively.
Dept's view
------------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X
--- region 2 Dept gives up starts
wait for the event ...
write barrier
write WAITSTART read barrier
sleep when (notice WAITSTART written)
ensure the request has come [S]
--- region 2 Dept gives up ends
--- region 3 starts here
...
if (can see WAITSTART written)
it's on the way to the event
...
...
--- region 3 ends here
finally the event [E]
In short, Dept works with region 3.
Thanks,
Byungchul
WARNING: multiple messages have this Message-ID (diff)
From: Byungchul Park <byungchul.park@lge.com>
To: Jan Kara <jack@suse.cz>
Cc: hamohammed.sa@gmail.com, peterz@infradead.org,
daniel.vetter@ffwll.ch, amir73il@gmail.com, david@fromorbit.com,
dri-devel@lists.freedesktop.org, chris@chris-wilson.co.uk,
bfields@fieldses.org, linux-ide@vger.kernel.org,
adilger.kernel@dilger.ca, joel@joelfernandes.org, cl@linux.com,
will@kernel.org, duyuyang@gmail.com, sashal@kernel.org,
paolo.valente@linaro.org, damien.lemoal@opensource.wdc.com,
willy@infradead.org, hch@infradead.org, airlied@linux.ie,
mingo@redhat.com, djwong@kernel.org, vdavydov.dev@gmail.com,
rientjes@google.com, dennis@kernel.org,
linux-ext4@vger.kernel.org, linux-mm@kvack.org,
ngupta@vflare.org, johannes.berg@intel.com, jack@suse.com,
dan.j.williams@intel.com, josef@toxicpanda.com,
rostedt@goodmis.org, linux-block@vger.kernel.org,
linux-fsdevel@vger.kernel.org, jglisse@redhat.com,
viro@zeniv.linux.org.uk, tglx@linutronix.de, mhocko@kernel.org,
vbabka@suse.cz, axboe@kernel.dk, melissa.srw@gmail.com,
sj@kernel.org, tytso@mit.edu, rodrigosiqueiramelo@gmail.com,
kernel-team@lge.com, gregkh@linuxfoundation.org,
jlayton@kernel.org, linux-kernel@vger.kernel.org,
penberg@kernel.org, minchan@kernel.org, hannes@cmpxchg.org,
tj@kernel.org, akpm@linux-foundation.org,
torvalds@linux-foundation.org
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1
Date: Fri, 4 Mar 2022 10:56:51 +0900 [thread overview]
Message-ID: <20220304015650.GC6112@X58A-UD3R> (raw)
In-Reply-To: <20220303095456.kym32pxshwryescx@quack3.lan>
On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote:
> On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> >
> > consumer producer
> >
> > lock L
> > (too much work queued == true)
> > unlock L
> > --- preempted
> > lock L
> > unlock L
> > do work
> > lock L
> > unlock L
> > do work
> > ...
> > (no work == true)
> > sleep
> > --- scheduled in
> > sleep
> >
> > This code leads a deadlock without another wakeup source, say, not safe.
>
> So the scenario you describe above is indeed possible. But the trick is
> that the wakeup from 'consumer' as is doing work will remove 'producer'
> from the wait queue and change the 'producer' process state to
> 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
> scheduler will just treat this as another preemption point and the
> 'producer' will immediately or soon continue to run. So indeed we can think
> of this as "another wakeup source" but the source is in the CPU scheduler
> itself. This is the standard way how waitqueues are used in the kernel...
Nice! Thanks for the explanation. I will take it into account if needed.
> > Lastly, just for your information, I need to explain how Dept works a
> > little more for you not to misunderstand Dept.
> >
> > Assuming the consumer and producer guarantee not to lead a deadlock like
> > the following, Dept won't report it a problem:
> >
> > consumer producer
> >
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > ... ...
> >
> > Dept does not consider all waits preceeding an event but only waits that
> > might lead a deadlock. In this case, Dept works with each region
> > independently.
> >
> > consumer producer
> >
> > sleep <- initiates region 1
> > --- region 1 starts
> > ... ...
> > --- region 1 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 2
> > --- region 2 starts
> > ... ...
> > --- region 2 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> > sleep <- initiates region 3
> > --- region 3 starts
> > ... ...
> > --- region 3 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 4
> > --- region 4 starts
> > ... ...
> > --- region 4 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> >
> > That is, Dept does not build dependencies across different regions. So
> > you don't have to worry about unreasonable false positives that much.
> >
> > Thoughts?
>
> Thanks for explanation! And what exactly defines the 'regions'? When some
> process goes to sleep on some waitqueue, this defines a start of a region
> at the place where all the other processes are at that moment and wakeup of
> the waitqueue is an end of the region?
Yes. Let me explain it more for better understanding.
(I copied it from the talk I did with Matthew..)
ideal view
-----------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X [S]
--- ideally region 1 starts here
wait for the event ...
sleep if (can see REQUESTEVENT written)
it's on the way to the event
...
...
--- ideally region 1 ends here
finally the event [E]
Dept basically works with the above view with regard to wait and event.
But it's very hard to identify the ideal [S] point in practice. So Dept
instead identifies [S] point by checking WAITSTART with memory barriers
like the following, which would make Dept work conservatively.
Dept's view
------------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X
--- region 2 Dept gives up starts
wait for the event ...
write barrier
write WAITSTART read barrier
sleep when (notice WAITSTART written)
ensure the request has come [S]
--- region 2 Dept gives up ends
--- region 3 starts here
...
if (can see WAITSTART written)
it's on the way to the event
...
...
--- region 3 ends here
finally the event [E]
In short, Dept works with region 3.
Thanks,
Byungchul
next prev parent reply other threads:[~2022-03-04 1:57 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 01/16] llist: Move llist_{head,node} definition to types.h Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 15:54 ` Steven Rostedt
2022-02-17 15:54 ` Steven Rostedt
2022-02-17 17:36 ` Steven Rostedt
2022-02-17 17:36 ` Steven Rostedt
2022-02-18 6:09 ` Byungchul Park
2022-02-18 6:09 ` Byungchul Park
2022-02-17 19:46 ` kernel test robot
2022-02-17 19:46 ` kernel test robot
2022-02-17 19:46 ` kernel test robot
2022-02-17 19:46 ` kernel test robot
2022-02-17 10:57 ` [PATCH 03/16] dept: Embed Dept data in Lockdep Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 04/16] dept: Apply Dept to spinlock Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 05/16] dept: Apply Dept to mutex families Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 06/16] dept: Apply Dept to rwlock Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 19:46 ` kernel test robot
2022-02-17 19:46 ` kernel test robot
2022-02-17 10:57 ` [PATCH 08/16] dept: Apply Dept to seqlock Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 09/16] dept: Apply Dept to rwsem Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 15:55 ` Steven Rostedt
2022-02-17 15:55 ` Steven Rostedt
2022-02-17 10:57 ` [PATCH 11/16] dept: Introduce split map concept and new APIs for them Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked,writeback} Byungchul Park
2022-02-17 10:57 ` [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked, writeback} Byungchul Park
2022-02-17 10:57 ` [PATCH 13/16] dept: Apply SDT to swait Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 14/16] dept: Apply SDT to wait(waitqueue) Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 15/16] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 16/16] dept: Distinguish each syscall context from another Byungchul Park
2022-02-17 10:57 ` Byungchul Park
2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
2022-02-17 11:10 ` Byungchul Park
2022-02-17 11:10 ` Report 2 " Byungchul Park
2022-02-17 11:10 ` Byungchul Park
2022-02-21 19:02 ` Jan Kara
2022-02-21 19:02 ` Jan Kara
2022-02-23 0:35 ` Byungchul Park
2022-02-23 0:35 ` Byungchul Park
2022-02-23 14:48 ` Jan Kara
2022-02-23 14:48 ` Jan Kara
2022-02-24 1:11 ` Byungchul Park
2022-02-24 1:11 ` Byungchul Park
2022-02-24 10:22 ` Jan Kara
2022-02-24 10:22 ` Jan Kara
2022-02-28 9:28 ` Byungchul Park
2022-02-28 9:28 ` Byungchul Park
2022-02-28 10:14 ` Jan Kara
2022-02-28 10:14 ` Jan Kara
2022-02-28 21:25 ` Theodore Ts'o
2022-02-28 21:25 ` Theodore Ts'o
2022-03-03 1:36 ` Byungchul Park
2022-03-03 1:36 ` Byungchul Park
2022-03-03 1:00 ` Byungchul Park
2022-03-03 1:00 ` Byungchul Park
2022-03-03 2:32 ` Theodore Ts'o
2022-03-03 2:32 ` Theodore Ts'o
2022-03-03 5:23 ` Byungchul Park
2022-03-03 5:23 ` Byungchul Park
2022-03-03 14:36 ` Theodore Ts'o
2022-03-03 14:36 ` Theodore Ts'o
2022-03-04 0:42 ` Byungchul Park
2022-03-04 0:42 ` Byungchul Park
2022-03-05 3:26 ` Theodore Ts'o
2022-03-05 3:26 ` Theodore Ts'o
2022-03-05 14:15 ` Byungchul Park
2022-03-05 14:15 ` Byungchul Park
2022-03-05 15:05 ` Joel Fernandes
2022-03-05 15:05 ` Joel Fernandes
2022-03-07 2:43 ` Byungchul Park
2022-03-07 2:43 ` Byungchul Park
2022-03-04 3:20 ` Byungchul Park
2022-03-04 3:20 ` Byungchul Park
2022-03-05 3:40 ` Theodore Ts'o
2022-03-05 3:40 ` Theodore Ts'o
2022-03-05 14:55 ` Byungchul Park
2022-03-05 14:55 ` Byungchul Park
2022-03-05 15:12 ` Reimar Döffinger
2022-03-05 15:12 ` Reimar Döffinger
2022-03-06 3:30 ` Theodore Ts'o
2022-03-06 3:30 ` Theodore Ts'o
2022-03-06 10:51 ` Byungchul Park
2022-03-06 10:51 ` Byungchul Park
2022-03-06 14:19 ` Theodore Ts'o
2022-03-06 14:19 ` Theodore Ts'o
2022-03-10 1:45 ` Byungchul Park
2022-03-10 1:45 ` Byungchul Park
2022-03-03 9:54 ` Jan Kara
2022-03-03 9:54 ` Jan Kara
2022-03-04 1:56 ` Byungchul Park [this message]
2022-03-04 1:56 ` Byungchul Park
2022-02-17 13:27 ` Report 1 " Matthew Wilcox
2022-02-17 13:27 ` Matthew Wilcox
2022-02-18 0:41 ` Byungchul Park
2022-02-18 0:41 ` Byungchul Park
2022-02-22 8:27 ` Jan Kara
2022-02-22 8:27 ` Jan Kara
2022-02-23 1:40 ` Byungchul Park
2022-02-23 1:40 ` Byungchul Park
2022-02-23 3:30 ` Byungchul Park
2022-02-23 3:30 ` Byungchul Park
2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
2022-02-17 15:51 ` Theodore Ts'o
2022-02-17 17:00 ` Steven Rostedt
2022-02-17 17:00 ` Steven Rostedt
2022-02-17 17:06 ` Matthew Wilcox
2022-02-17 17:06 ` Matthew Wilcox
2022-02-19 10:05 ` Byungchul Park
2022-02-19 10:05 ` Byungchul Park
2022-02-18 4:19 ` Theodore Ts'o
2022-02-18 4:19 ` Theodore Ts'o
2022-02-19 10:34 ` Byungchul Park
2022-02-19 10:34 ` Byungchul Park
2022-02-19 10:18 ` Byungchul Park
2022-02-19 10:18 ` Byungchul Park
2022-02-19 9:54 ` Byungchul Park
2022-02-19 9:54 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220304015650.GC6112@X58A-UD3R \
--to=byungchul.park@lge.com \
--cc=adilger.kernel@dilger.ca \
--cc=airlied@linux.ie \
--cc=akpm@linux-foundation.org \
--cc=amir73il@gmail.com \
--cc=axboe@kernel.dk \
--cc=bfields@fieldses.org \
--cc=chris@chris-wilson.co.uk \
--cc=cl@linux.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dan.j.williams@intel.com \
--cc=daniel.vetter@ffwll.ch \
--cc=david@fromorbit.com \
--cc=dennis@kernel.org \
--cc=djwong@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=duyuyang@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=hamohammed.sa@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=jglisse@redhat.com \
--cc=jlayton@kernel.org \
--cc=joel@joelfernandes.org \
--cc=johannes.berg@intel.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@lge.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=melissa.srw@gmail.com \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=mingo@redhat.com \
--cc=ngupta@vflare.org \
--cc=paolo.valente@linaro.org \
--cc=penberg@kernel.org \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=rodrigosiqueiramelo@gmail.com \
--cc=rostedt@goodmis.org \
--cc=sashal@kernel.org \
--cc=sj@kernel.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.