From: Byungchul Park <byungchul.park@lge.com>
To: Jan Kara <jack@suse.cz>
Cc: torvalds@linux-foundation.org, damien.lemoal@opensource.wdc.com,
linux-ide@vger.kernel.org, adilger.kernel@dilger.ca,
linux-ext4@vger.kernel.org, mingo@redhat.com,
linux-kernel@vger.kernel.org, peterz@infradead.org,
will@kernel.org, tglx@linutronix.de, rostedt@goodmis.org,
joel@joelfernandes.org, sashal@kernel.org,
daniel.vetter@ffwll.ch, chris@chris-wilson.co.uk,
duyuyang@gmail.com, johannes.berg@intel.com, tj@kernel.org,
tytso@mit.edu, willy@infradead.org, david@fromorbit.com,
amir73il@gmail.com, bfields@fieldses.org,
gregkh@linuxfoundation.org, kernel-team@lge.com,
linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@kernel.org,
minchan@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com,
sj@kernel.org, jglisse@redhat.com, dennis@kernel.org,
cl@linux.com, penberg@kernel.org, rientjes@google.com,
vbabka@suse.cz, ngupta@vflare.org, linux-block@vger.kernel.org,
axboe@kernel.dk, paolo.valente@linaro.org, josef@toxicpanda.com,
linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
jack@suse.com, jlayton@kernel.org, dan.j.williams@intel.com,
hch@infradead.org, djwong@kernel.org,
dri-devel@lists.freedesktop.org, airlied@linux.ie,
rodrigosiqueiramelo@gmail.com, melissa.srw@gmail.com,
hamohammed.sa@gmail.com
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1
Date: Fri, 4 Mar 2022 10:56:51 +0900 [thread overview]
Message-ID: <20220304015650.GC6112@X58A-UD3R> (raw)
In-Reply-To: <20220303095456.kym32pxshwryescx@quack3.lan>
On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote:
> On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> >
> > consumer producer
> >
> > lock L
> > (too much work queued == true)
> > unlock L
> > --- preempted
> > lock L
> > unlock L
> > do work
> > lock L
> > unlock L
> > do work
> > ...
> > (no work == true)
> > sleep
> > --- scheduled in
> > sleep
> >
> > This code leads a deadlock without another wakeup source, say, not safe.
>
> So the scenario you describe above is indeed possible. But the trick is
> that the wakeup from 'consumer' as is doing work will remove 'producer'
> from the wait queue and change the 'producer' process state to
> 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
> scheduler will just treat this as another preemption point and the
> 'producer' will immediately or soon continue to run. So indeed we can think
> of this as "another wakeup source" but the source is in the CPU scheduler
> itself. This is the standard way how waitqueues are used in the kernel...
Nice! Thanks for the explanation. I will take it into account if needed.
> > Lastly, just for your information, I need to explain how Dept works a
> > little more for you not to misunderstand Dept.
> >
> > Assuming the consumer and producer guarantee not to lead a deadlock like
> > the following, Dept won't report it a problem:
> >
> > consumer producer
> >
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > ... ...
> >
> > Dept does not consider all waits preceeding an event but only waits that
> > might lead a deadlock. In this case, Dept works with each region
> > independently.
> >
> > consumer producer
> >
> > sleep <- initiates region 1
> > --- region 1 starts
> > ... ...
> > --- region 1 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 2
> > --- region 2 starts
> > ... ...
> > --- region 2 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> > sleep <- initiates region 3
> > --- region 3 starts
> > ... ...
> > --- region 3 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 4
> > --- region 4 starts
> > ... ...
> > --- region 4 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> >
> > That is, Dept does not build dependencies across different regions. So
> > you don't have to worry about unreasonable false positives that much.
> >
> > Thoughts?
>
> Thanks for explanation! And what exactly defines the 'regions'? When some
> process goes to sleep on some waitqueue, this defines a start of a region
> at the place where all the other processes are at that moment and wakeup of
> the waitqueue is an end of the region?
Yes. Let me explain it more for better understanding.
(I copied it from the talk I did with Matthew..)
ideal view
-----------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X [S]
--- ideally region 1 starts here
wait for the event ...
sleep if (can see REQUESTEVENT written)
it's on the way to the event
...
...
--- ideally region 1 ends here
finally the event [E]
Dept basically works with the above view with regard to wait and event.
But it's very hard to identify the ideal [S] point in practice. So Dept
instead identifies [S] point by checking WAITSTART with memory barriers
like the following, which would make Dept work conservatively.
Dept's view
------------
context X context Y
request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X
--- region 2 Dept gives up starts
wait for the event ...
write barrier
write WAITSTART read barrier
sleep when (notice WAITSTART written)
ensure the request has come [S]
--- region 2 Dept gives up ends
--- region 3 starts here
...
if (can see WAITSTART written)
it's on the way to the event
...
...
--- region 3 ends here
finally the event [E]
In short, Dept works with region 3.
Thanks,
Byungchul
next prev parent reply other threads:[~2022-03-04 1:57 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
2022-02-17 10:57 ` [PATCH 01/16] llist: Move llist_{head,node} definition to types.h Byungchul Park
2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
2022-02-17 15:54 ` Steven Rostedt
2022-02-17 17:36 ` Steven Rostedt
2022-02-18 6:09 ` Byungchul Park
2022-02-17 10:57 ` [PATCH 03/16] dept: Embed Dept data in Lockdep Byungchul Park
2022-02-17 10:57 ` [PATCH 04/16] dept: Apply Dept to spinlock Byungchul Park
2022-02-17 10:57 ` [PATCH 05/16] dept: Apply Dept to mutex families Byungchul Park
2022-02-17 10:57 ` [PATCH 06/16] dept: Apply Dept to rwlock Byungchul Park
2022-02-17 10:57 ` [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
2022-02-17 10:57 ` [PATCH 08/16] dept: Apply Dept to seqlock Byungchul Park
2022-02-17 10:57 ` [PATCH 09/16] dept: Apply Dept to rwsem Byungchul Park
2022-02-17 10:57 ` [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph Byungchul Park
2022-02-17 15:55 ` Steven Rostedt
2022-02-17 10:57 ` [PATCH 11/16] dept: Introduce split map concept and new APIs for them Byungchul Park
2022-02-17 10:57 ` [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked,writeback} Byungchul Park
2022-02-17 10:57 ` [PATCH 13/16] dept: Apply SDT to swait Byungchul Park
2022-02-17 10:57 ` [PATCH 14/16] dept: Apply SDT to wait(waitqueue) Byungchul Park
2022-02-17 10:57 ` [PATCH 15/16] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread Byungchul Park
2022-02-17 10:57 ` [PATCH 16/16] dept: Distinguish each syscall context from another Byungchul Park
2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
2022-02-17 11:10 ` Report 2 " Byungchul Park
2022-02-21 19:02 ` Jan Kara
2022-02-23 0:35 ` Byungchul Park
2022-02-23 14:48 ` Jan Kara
2022-02-24 1:11 ` Byungchul Park
2022-02-24 10:22 ` Jan Kara
2022-02-28 9:28 ` Byungchul Park
2022-02-28 10:14 ` Jan Kara
2022-02-28 21:25 ` Theodore Ts'o
2022-03-03 1:36 ` Byungchul Park
2022-03-03 1:00 ` Byungchul Park
2022-03-03 2:32 ` Theodore Ts'o
2022-03-03 5:23 ` Byungchul Park
2022-03-03 14:36 ` Theodore Ts'o
2022-03-04 0:42 ` Byungchul Park
2022-03-05 3:26 ` Theodore Ts'o
2022-03-05 14:15 ` Byungchul Park
2022-03-05 15:05 ` Joel Fernandes
2022-03-07 2:43 ` Byungchul Park
2022-03-04 3:20 ` Byungchul Park
2022-03-05 3:40 ` Theodore Ts'o
2022-03-05 14:55 ` Byungchul Park
2022-03-05 15:12 ` Reimar Döffinger
2022-03-06 3:30 ` Theodore Ts'o
2022-03-06 10:51 ` Byungchul Park
2022-03-06 14:19 ` Theodore Ts'o
2022-03-10 1:45 ` Byungchul Park
2022-03-03 9:54 ` Jan Kara
2022-03-04 1:56 ` Byungchul Park [this message]
2022-02-17 13:27 ` Report 1 " Matthew Wilcox
2022-02-18 0:41 ` Byungchul Park
2022-02-22 8:27 ` Jan Kara
2022-02-23 1:40 ` Byungchul Park
2022-02-23 3:30 ` Byungchul Park
2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
2022-02-17 17:00 ` Steven Rostedt
2022-02-17 17:06 ` Matthew Wilcox
2022-02-19 10:05 ` Byungchul Park
2022-02-18 4:19 ` Theodore Ts'o
2022-02-19 10:34 ` Byungchul Park
2022-02-19 10:18 ` Byungchul Park
2022-02-19 9:54 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220304015650.GC6112@X58A-UD3R \
--to=byungchul.park@lge.com \
--cc=adilger.kernel@dilger.ca \
--cc=airlied@linux.ie \
--cc=akpm@linux-foundation.org \
--cc=amir73il@gmail.com \
--cc=axboe@kernel.dk \
--cc=bfields@fieldses.org \
--cc=chris@chris-wilson.co.uk \
--cc=cl@linux.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dan.j.williams@intel.com \
--cc=daniel.vetter@ffwll.ch \
--cc=david@fromorbit.com \
--cc=dennis@kernel.org \
--cc=djwong@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=duyuyang@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=hamohammed.sa@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=jglisse@redhat.com \
--cc=jlayton@kernel.org \
--cc=joel@joelfernandes.org \
--cc=johannes.berg@intel.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@lge.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=melissa.srw@gmail.com \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=mingo@redhat.com \
--cc=ngupta@vflare.org \
--cc=paolo.valente@linaro.org \
--cc=penberg@kernel.org \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=rodrigosiqueiramelo@gmail.com \
--cc=rostedt@goodmis.org \
--cc=sashal@kernel.org \
--cc=sj@kernel.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).