linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@linux.intel.com>,
	Kan Liang <kan.liang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
	Christopher Lameter <cl@linux.com>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit
Date: Fri, 25 Aug 2017 15:19:53 -0700	[thread overview]
Message-ID: <f10f4c25-49c0-7ef5-55c2-769c8fd9bf90@linux.intel.com> (raw)
In-Reply-To: <CA+55aFyErsNw8bqTOCzcrarDZBdj+Ev=1N3sV-gxtLTH03bBFQ@mail.gmail.com>

On 08/25/2017 12:58 PM, Linus Torvalds wrote:
> On Fri, Aug 25, 2017 at 9:13 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>> Now that we have added breaks in the wait queue scan and allow bookmark
>> on scan position, we put this logic in the wake_up_page_bit function.
> 
> Oh, _this_ is the other patch you were talking about. I thought it was
> the NUMA counter threashold that was discussed around the same time,
> and that's why I referred to Mel.
> 
> Gods, _this_ patch is ugly.  No, I'm not happy with it at all. It
> makes that wait_queue_head much bigger, for this disgusting one use.
> 
> So no, this is no good.
> 
> Now, maybe the page_wait_table[] itself could be changed to be
> something that is *not* just the wait-queue head.
> 
> But if we need to change the page_wait_table[] itself to have more
> information, then we should make it not be a wait-queue at all, we
> should make it be a list of much more specialized entries, indexed by
> the {page,bit} tuple.
> 
> And once we do that, we probably *could* use something like two
> specialized lists: one that is wake-all, and one that is wake-one.
> 
> So you'd have something like
> 
>     struct page_wait_struct {
>         struct list_node list;
>         struct page *page;
>         int bit;
>         struct llist_head all;
>         struct llist_head exclusive;
>     };
> 
> and then the "page_wait_table[]" would be just an array of
> 
>     struct page_wait_head {
>         spinlock_t lock;
>         struct hlist_head list;
>     };
> 
> and then the rule is:
> 
>  - each page/bit combination allocates one of these page_wait_struct
> entries when somebody starts waiting for it for the first time (and we
> use the spinlock in the page_wait_head to serialize that list).
> 
>  - an exclusive wait (ie somebody who waits to get the bit for
> locking) adds itself to the 'exclusive' llist
> 
>  - non-locking waiters add themselves to the 'all' list
> 
>  - we can use llist_del_all() to remove the 'all' list and then walk
> it and wake them up without any locks
> 
>  - we can use llist_del_first() to remove the first exclusive waiter
> and wait _it_ up without any locks.
> 
> Hmm? How does that sound? That means that we wouldn't use the normal
> wait-queues at all for the page hash waiting. We'd use this two-level
> list: one list to find the page/bit thing, and then two lists within
> that fdor the wait-queues for just *that* page/bit.
> 
> So no need for the 'key' stuff at all, because the page/bit data would
> be in the first data list, and the second list wouldn't have any of
> these traversal issues where you have to be careful and do it one
> entry at a time.

If I have the waker count and flags on the wait_page_queue structure
will that have been more acceptable?

Also I think patch 1 is still a good idea for a fail safe mechanism
in case there are other long wait list.

That said, I do think your suggested approach is cleaner.  However, it
is a much more substantial change.  Let me take a look and see if I
have any issues implementing it.

Tim



> 
>                  Linus
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-25 22:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25 16:13 [PATCH 1/2 v2] sched/wait: Break up long wake list walk Tim Chen
2017-08-25 16:13 ` [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit Tim Chen
2017-08-25 19:58   ` Linus Torvalds
2017-08-25 22:19     ` Tim Chen [this message]
2017-08-25 22:51       ` Linus Torvalds
2017-08-25 23:03         ` Linus Torvalds
2017-08-26  0:31           ` Linus Torvalds
2017-08-26  2:54             ` Linus Torvalds
2017-08-26 18:15               ` Linus Torvalds
2017-08-27 21:40                 ` Linus Torvalds
2017-08-27 21:42                   ` Linus Torvalds
2017-08-27 23:12                   ` Linus Torvalds
2017-08-28  1:16                     ` Nicholas Piggin
2017-08-28  1:29                       ` Nicholas Piggin
2017-08-28  5:17                         ` Linus Torvalds
2017-08-28  7:18                           ` Nicholas Piggin
2017-08-28 14:51                 ` Liang, Kan
2017-08-28 16:48                   ` Linus Torvalds
2017-08-28 20:01                     ` Tim Chen
2017-08-29 12:57                     ` Liang, Kan
2017-08-29 16:01                       ` Linus Torvalds
2017-08-29 16:13                         ` Tim Chen
2017-08-29 16:24                           ` Linus Torvalds
2017-08-29 16:57                             ` Tim Chen
2017-09-14  2:12                             ` Tim Chen
2017-09-14  2:27                               ` Linus Torvalds
2017-09-14 16:50                                 ` Tim Chen
2017-09-14 17:00                                   ` Linus Torvalds
2017-09-14 16:39                               ` Christopher Lameter
2017-08-29 16:17                     ` Tim Chen
2017-08-29 16:22                       ` Linus Torvalds
2017-08-25 17:46 ` [PATCH 1/2 v2] sched/wait: Break up long wake list walk Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f10f4c25-49c0-7ef5-55c2-769c8fd9bf90@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=ebiederm@xmission.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).