public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Alexey Gladkov <legion@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexey Gladkov <gladkov.alexey@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v3 3/6] rculist: Add hlist_swap_before_rcu
Date: Mon, 27 Apr 2020 09:28:34 -0500	[thread overview]
Message-ID: <878sihgfzh.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CAHk-=whvktUC9VbzWLDw71BHbV4ofkkuAYsrB5Rmxnhc-=kSeQ@mail.gmail.com> (Linus Torvalds's message of "Sun, 26 Apr 2020 10:40:17 -0700")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, Apr 26, 2020 at 7:14 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> To support this add hlist_swap_before_rcu.  An hlist primitive that
>> will allow swaping the leading sections of two tasks.  For exchanging
>> the task pids it will just be swapping the hlist_heads of two single
>> entry lists.  But the functionality is more general.
>
> So I have no problems with the series now - the code is much more
> understandable. Partly because of the split-up, partly because of the
> comments, and partly because you explained the special case and why it
> was a valid thing to do...
>
> However, I did start thinking about this case again.
>
> I still don't think the "swap entry" macro is necessarily useful in
> _general_ - any time it's an actual individual entry, that swap macro
> doesn't really work.

But it isn't a "swap entry" macro/function.  I did not even attempt
to make it a "swap entry" function.

I made a chop two lists into two and swap the pieces function.

> So the only reason it works here is because you're actually swapping
> the whole list.
>
> But that, in turn, shouldn't be using that "first node" model at all,
> it should use the hlist_head. That would have made it a lot more
> obvious what is actually going on to me.
>
> Now, the comment very much talks about the head case, but the code
> still looks like it's swapping a non-head thing.
>
> I guess the code technically _works_ with "swap two list ends", but
> does that actually really make sense as an operation?

As an operation yes.  Will anyone else want that operation I don't know.

> So I no longer hate how this patch looks, but I wonder if we should
> just make the whole "this node is the *first* node" a bit more
> explicit in both the caller and in the swapping code.
>
> It could be as simple as replacing just the conceptual types and
> names, so instead of some "pnode1" double-indirect node pointer, we'd
> have
>
>         struct hlist_head *left_head = container_of(left->pprev,
> struct hlist_head, first);
>         struct hlist_head *right_head = container_of(right->pprev,
> struct hlist_head, first);
>
> and then the code would do
>
>         rcu_assign_pointer(right_head->first, left);
>         rcu_assign_pointer(left_head->first, right);
>         WRITE_ONCE(left->pprev,  &right_head->first);
>         WRITE_ONCE(right->pprev,  &left_head->first);
>
> which should generate the exact same code, but makes it clear that
> what we're doing is switching the whole hlist when given the first
> entries.
>
> Doesn't that make what it actually does a lot more understandable?

Understandable is a bit subjective. I think having a well defined hlist
operation I can call makes things more understandable.

I think the getting the list head as:
"head = &task->thread_pid->tasks[PIDTYPE_PID];" is more understandable
and less risky than container_of.

My concern and probably unreasonbable as this is a slow path
with getting the list heads after looking up the pid is that it seems
to add a wait for an additional cache line to load before anything can
happen.

The only way I really know to make this code much more understandable is
to remove the lists entirely for this case.  But that is a much larger
change and it is not clear that it makes the kernel code overall better.
I stared at that for a while and it is an interesting follow on but not
something I want or we even can do before exchange_tids is in place.

> The
> *pnode1/pnode2 games are somewhat opaque, but with that type and name
> change and using "container_of()", the code now fairly naturally reads
> as "oh, we're changing the first pointers in the list heads, and
> making the nodes point back to them" .
>
> Again - the current function _works_ with swapping two hlists in the
> middle (not two entries - it swaps the whole list starting at that
> entry!), so your current patch is in some ways "more generic". I'm
> just suggesting that the generic case doesn't make much sense, and
> that the "we know the first entries, swap the lists" actually is what
> the real use is, and writing it as such makes the code easier to
> understand.

Yep.  That is waht I designed it to do.  I sort of went the other
direction when writing this.  I could start with the list heads and swap
the rest of the lists and get the same code.  But it looked like it
would be a little slower to find the hlist_heads, and I couldn't think
of a good name for the function.  So I figured if I was writing a
fucntion for this case I would write one that was convinient.

For understandability that is my real challenge what is a good name
that people can read and understand what is happening for this swapping
function.

> But I'm not going to insist on this, so this is more an RFC. Maybe
> people disagree, and/or have an actual use case for that "break two
> hlists in the middle, swap the ends" that I find unlikely...
>
> (NOTE: My "convert to hlist_head" code _works_ for that case too
> because the code generation is the same! But it would be really really
> confusing for that code to be used for anything but the first entry).

Yes.

I am open to improvements.  Especially in the naming.

Would hlists_swap_heads_rcu be noticably better?

Eric


  reply	other threads:[~2020-04-27 14:32 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-19 14:10 [PATCH v12 0/7] proc: modernize proc to support multiple private instances Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 1/7] proc: rename struct proc_fs_info to proc_fs_opts Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 2/7] proc: allow to mount many instances of proc in one pid namespace Alexey Gladkov
2020-04-23 11:28   ` [PATCH v13 " Alexey Gladkov
2020-04-23 12:16     ` Eric W. Biederman
2020-04-23 20:01       ` Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 3/7] proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 4/7] proc: add option to mount only a pids subset Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 5/7] docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 6/7] proc: use human-readable values for hidepid Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 7/7] proc: use named enums for better readability Alexey Gladkov
     [not found] ` <87ftcv1nqe.fsf@x220.int.ebiederm.org>
2020-04-23 17:54   ` [PATCH v2 0/2] proc: Calling proc_flush_task exactly once per task Oleg Nesterov
2020-04-23 19:38     ` Eric W. Biederman
2020-04-23 19:39   ` [PATCH v2 1/2] proc: Use PIDTYPE_TGID in next_tgid Eric W. Biederman
2020-04-24 17:29     ` Oleg Nesterov
2020-04-23 19:39   ` [PATCH v2 2/2] proc: Ensure we see the exit of each process tid exactly Eric W. Biederman
2020-04-23 20:28     ` Linus Torvalds
2020-04-24  3:33       ` Eric W. Biederman
2020-04-24 18:02         ` Linus Torvalds
2020-04-24 18:46           ` Linus Torvalds
2020-04-24 19:51           ` Eric W. Biederman
2020-04-24 20:10             ` Linus Torvalds
2020-04-24 17:39     ` Oleg Nesterov
2020-04-24 18:10       ` Eric W. Biederman
2020-04-24 20:50       ` [PATCH] proc: Put thread_pid in release_task not proc_flush_pid Eric W. Biederman
     [not found]       ` <87mu6ymkea.fsf_-_@x220.int.ebiederm.org>
     [not found]         ` <87blnemj5t.fsf_-_@x220.int.ebiederm.org>
2020-04-26 17:22           ` [PATCH v3 2/6] posix-cpu-timers: Use PIDTYPE_TGID to simplify the logic in lookup_task Oleg Nesterov
2020-04-27 11:51             ` Eric W. Biederman
2020-04-28 18:03               ` Oleg Nesterov
2020-04-27 10:32           ` Thomas Gleixner
2020-04-27 19:46             ` Eric W. Biederman
     [not found]         ` <875zdmmj4y.fsf_-_@x220.int.ebiederm.org>
2020-04-26 17:40           ` [PATCH v3 3/6] rculist: Add hlist_swap_before_rcu Linus Torvalds
2020-04-27 14:28             ` Eric W. Biederman [this message]
2020-04-27 20:27               ` Linus Torvalds
2020-04-28 12:16                 ` [PATCH v4 0/2] proc: Ensure we see the exit of each process tid exactly Eric W. Biederman
2020-04-28 12:18                   ` [PATCH v4 1/2] rculist: Add hlists_swap_heads_rcu Eric W. Biederman
2020-04-28 12:19                   ` [PATCH v4 2/2] proc: Ensure we see the exit of each process tid exactly once Eric W. Biederman
2020-04-28 16:53                   ` [PATCH v4 0/2] proc: Ensure we see the exit of each process tid exactly Linus Torvalds
2020-04-28 17:55                     ` Eric W. Biederman
2020-04-28 18:55                     ` Eric W. Biederman
2020-04-28 19:36                       ` Linus Torvalds
2020-04-28 18:05                   ` Oleg Nesterov
2020-04-28 18:54                     ` Eric W. Biederman
2020-04-28 21:39                     ` [PATCH v1 0/4] signal: Removing has_group_leader_pid Eric W. Biederman
2020-04-28 21:45                       ` [PATCH v1 1/4] posix-cpu-timer: Tidy up group_leader logic in lookup_task Eric W. Biederman
2020-04-28 21:48                       ` [PATCH 2/4] posix-cpu-timer: Unify the now redundant code " Eric W. Biederman
2020-04-28 21:53                       ` [PATCH v1 3/4] exec: Remove BUG_ON(has_group_leader_pid) Eric W. Biederman
2020-04-28 21:56                       ` [PATCH v4 4/4] signal: Remove has_group_leader_pid Eric W. Biederman
2020-04-30 11:54                       ` [PATCH v1 0/3] posix-cpu-timers: Use pids not tasks in lookup Eric W. Biederman
2020-04-30 11:55                         ` [PATCH v1 1/3] posix-cpu-timers: Extend rcu_read_lock removing task_struct references Eric W. Biederman
2020-04-30 11:56                         ` [PATCH v1 2/3] posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type Eric W. Biederman
2020-04-30 11:56                         ` [PATCH v1 3/3] posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock Eric W. Biederman
     [not found]         ` <87h7x6mj6h.fsf_-_@x220.int.ebiederm.org>
2020-04-27  9:43           ` [PATCH v3 1/6] posix-cpu-timers: Always call __get_task_for_clock holding rcu_read_lock Thomas Gleixner
2020-04-27 11:53             ` Eric W. Biederman
     [not found]         ` <87r1w8ete7.fsf@x220.int.ebiederm.org>
2020-04-27 20:23           ` [PATCH v3] proc: Ensure we see the exit of each process tid exactly Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878sihgfzh.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gladkov.alexey@gmail.com \
    --cc=legion@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox