From: Tim Chen <tim.c.chen@linux.intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@linux.intel.com>,
Kan Liang <kan.liang@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
Christopher Lameter <cl@linux.com>,
"Eric W . Biederman" <ebiederm@xmission.com>,
Davidlohr Bueso <dave@stgolabs.net>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit
Date: Fri, 25 Aug 2017 15:19:53 -0700 [thread overview]
Message-ID: <f10f4c25-49c0-7ef5-55c2-769c8fd9bf90@linux.intel.com> (raw)
In-Reply-To: <CA+55aFyErsNw8bqTOCzcrarDZBdj+Ev=1N3sV-gxtLTH03bBFQ@mail.gmail.com>
On 08/25/2017 12:58 PM, Linus Torvalds wrote:
> On Fri, Aug 25, 2017 at 9:13 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>> Now that we have added breaks in the wait queue scan and allow bookmark
>> on scan position, we put this logic in the wake_up_page_bit function.
>
> Oh, _this_ is the other patch you were talking about. I thought it was
> the NUMA counter threashold that was discussed around the same time,
> and that's why I referred to Mel.
>
> Gods, _this_ patch is ugly. No, I'm not happy with it at all. It
> makes that wait_queue_head much bigger, for this disgusting one use.
>
> So no, this is no good.
>
> Now, maybe the page_wait_table[] itself could be changed to be
> something that is *not* just the wait-queue head.
>
> But if we need to change the page_wait_table[] itself to have more
> information, then we should make it not be a wait-queue at all, we
> should make it be a list of much more specialized entries, indexed by
> the {page,bit} tuple.
>
> And once we do that, we probably *could* use something like two
> specialized lists: one that is wake-all, and one that is wake-one.
>
> So you'd have something like
>
> struct page_wait_struct {
> struct list_node list;
> struct page *page;
> int bit;
> struct llist_head all;
> struct llist_head exclusive;
> };
>
> and then the "page_wait_table[]" would be just an array of
>
> struct page_wait_head {
> spinlock_t lock;
> struct hlist_head list;
> };
>
> and then the rule is:
>
> - each page/bit combination allocates one of these page_wait_struct
> entries when somebody starts waiting for it for the first time (and we
> use the spinlock in the page_wait_head to serialize that list).
>
> - an exclusive wait (ie somebody who waits to get the bit for
> locking) adds itself to the 'exclusive' llist
>
> - non-locking waiters add themselves to the 'all' list
>
> - we can use llist_del_all() to remove the 'all' list and then walk
> it and wake them up without any locks
>
> - we can use llist_del_first() to remove the first exclusive waiter
> and wait _it_ up without any locks.
>
> Hmm? How does that sound? That means that we wouldn't use the normal
> wait-queues at all for the page hash waiting. We'd use this two-level
> list: one list to find the page/bit thing, and then two lists within
> that fdor the wait-queues for just *that* page/bit.
>
> So no need for the 'key' stuff at all, because the page/bit data would
> be in the first data list, and the second list wouldn't have any of
> these traversal issues where you have to be careful and do it one
> entry at a time.
If I have the waker count and flags on the wait_page_queue structure
will that have been more acceptable?
Also I think patch 1 is still a good idea for a fail safe mechanism
in case there are other long wait list.
That said, I do think your suggested approach is cleaner. However, it
is a much more substantial change. Let me take a look and see if I
have any issues implementing it.
Tim
>
> Linus
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-08-25 22:19 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-25 16:13 [PATCH 1/2 v2] sched/wait: Break up long wake list walk Tim Chen
2017-08-25 16:13 ` [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit Tim Chen
2017-08-25 19:58 ` Linus Torvalds
2017-08-25 22:19 ` Tim Chen [this message]
2017-08-25 22:51 ` Linus Torvalds
2017-08-25 23:03 ` Linus Torvalds
2017-08-26 0:31 ` Linus Torvalds
2017-08-26 2:54 ` Linus Torvalds
2017-08-26 18:15 ` Linus Torvalds
2017-08-27 21:40 ` Linus Torvalds
2017-08-27 21:42 ` Linus Torvalds
2017-08-27 23:12 ` Linus Torvalds
2017-08-28 1:16 ` Nicholas Piggin
2017-08-28 1:29 ` Nicholas Piggin
2017-08-28 5:17 ` Linus Torvalds
2017-08-28 7:18 ` Nicholas Piggin
2017-08-28 14:51 ` Liang, Kan
2017-08-28 16:48 ` Linus Torvalds
2017-08-28 20:01 ` Tim Chen
2017-08-29 12:57 ` Liang, Kan
2017-08-29 16:01 ` Linus Torvalds
2017-08-29 16:13 ` Tim Chen
2017-08-29 16:24 ` Linus Torvalds
2017-08-29 16:57 ` Tim Chen
2017-09-14 2:12 ` Tim Chen
2017-09-14 2:27 ` Linus Torvalds
2017-09-14 16:50 ` Tim Chen
2017-09-14 17:00 ` Linus Torvalds
2017-09-14 16:39 ` Christopher Lameter
2017-08-29 16:17 ` Tim Chen
2017-08-29 16:22 ` Linus Torvalds
2017-08-25 17:46 ` [PATCH 1/2 v2] sched/wait: Break up long wake list walk Christopher Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f10f4c25-49c0-7ef5-55c2-769c8fd9bf90@linux.intel.com \
--to=tim.c.chen@linux.intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=dave@stgolabs.net \
--cc=ebiederm@xmission.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=kan.liang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).