public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	linux-s390@vger.kernel.org
Subject: Re: prospective 13/12 s390 pgtable_list patch
Date: Fri, 23 Jun 2023 13:53:43 -0700 (PDT)	[thread overview]
Message-ID: <232e20a5-ef6-41cf-4073-92fcb3a01453@google.com> (raw)
In-Reply-To: <ZJVl7ZJiborhmtYh@li-008a6a4c-3549-11b2-a85c-c5cc2836eea2.ibm.com>

On Fri, 23 Jun 2023, Alexander Gordeev wrote:
> On Thu, Jun 22, 2023 at 10:49:43PM -0700, Hugh Dickins wrote:
> > Hi Gerald,
> > 
> > It's that moment you've been dreading: I'm hoping that you can, please,
> > take a look at the patch below, and try building and running with it,
> > on top of the v2 series of 12 I sent out on Tuesday.
> > 
> > If this seems okay to you, I'll send it properly as 13/12 of that series,
> > to the full Cc list; but of course you may find I've missed typos and worse
> > - please don't waste your time on it if it's rubbish, but so far as I can
> > tell, it is complete and ready now.
> > 
> > Thanks!
> > Hugh
> 
> Hi Hugh,
> 
> Gerald is off until Monday and I think is not able to answer right now.

Thanks a lot for stepping in, Alexander.

> 
> We had discussions with regard to how to better approach your series and
> did not come to a conclusion unfortunatelly.
> 
> Gerald had several concerns - one of them is global mm_pgtable_list_lock,
> wich is luckily avoided with this follow-up patch. But there were others,
> which I am not able to articulate in detail.
> 
> While you are doing an outstanding job in trying to adjust our fragmented
> page tables reuse scheme, one of the ideas emerged was to actually give it
> up: partially or may be even fully. That is - not to reuse page tables
> returned via pte_free_defer() or not to reuse them at all. To assess this
> possible new approaches some statistics is needed and am working on a
> prototype that would allow collecting it.
> 
> Note, our existing code is extremly complicated already and we had hard
> time fixing (at least one) ugly race related to that. With the changes
> you suggest that complexity (to me personally) multiplies. But that well
> could be the other way around and I am just not smart enough to grasp it.
> At least the claim that page_table_free() no longer needs the two-step
> release indicates that.

Yes, I had to cool myself down to a low temperature, and think about it
for several hours before arriving at that conclusion.  It would be nice
if I could point to one fact which explains it convincingly (something
along the lines of "look, we already took it off the list in that case");
but didn't manage to put it into words, and ended up deciding that we
each have to do our own thinking to convince ourselves on that issue.

> 
> I am sorry that it is probably not the status you would like to hear,

Not a problem for me at all: you're absolutely right to question whether
the existing, and the added, complexity is worth it - all of us who have
looked into that code have wondered the same.

Initially I refused to even try to get into it; but in the end was quite
proud that I had, given time, managed to work with it.  But no problem
if that work is quickly discarded in favour of simplicity.

> but I still wonder what is your opinion on that do-not-reuse-fragments

I don't want to sway you one way or the other on that: I just want to
work with whatever you decide.  I know Matthew Wilcox would prefer if
powerpc and s390 went about things in the same way (but they do have
different constraints, so I don't necessarily agree); but I did not
feel able to persuade powerpc to adopt s390's more complex approach,
nor to persuade s390 to downgrade to powerpc's simpler approach.

> approach? Would it simplify pte_free_defer() or had no effect?

It would certainly simplify it a lot.  Whether it would just be a
matter of deleting my attempt to list_add() from __tlb_remove_table(),
so restoring the per-mm lock, and forgetting the SLAB_TYPESAFE_BY_RCU;
or whether it would go further, and most of the PP-AA bit tracking fall
away, I cannot predict - depends rather on who does the job, and how
far they choose to take it.

Two notes I want to throw into the mix:

One, FWIW my guess is that for most mms, the s390 PP-AA tracking adds very
little value; but without it, perhaps there is some mm, some workload, which
degrades to the point of using only half of the pgtable memory it allocates.

Two, maybe you'll see this as a further complication, to avoid getting into,
rather than a simplification: but it occurred to me while working here that
s390 has no need to keep the pgtable_trans_huge_deposit/withdraw() code in
mm/pgtable.c separate from the mm/pgalloc.c code.  They can use just the
one list, and deposit/withdraw simply keep a count of how many immediately
usable free pgtables must always be kept in reserve on that list.  (But
this would not be true at all for powerpc.)

> 
> Anyway, that is just another option and I will try your patch.

Thank you, please do, if you have the time: my series does need some
kind of s390 solution, depending on whatever s390 has in its tree at
the time: for now I should at least be showing the 13/12 (preferably
known to build and appearing to work!), even if all it does is help
people to say "no, that's too much".

Hugh

  reply	other threads:[~2023-06-23 20:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-23  5:49 prospective 13/12 s390 pgtable_list patch Hugh Dickins
2023-06-23  9:29 ` Alexander Gordeev
2023-06-23 20:53   ` Hugh Dickins [this message]
2023-06-26 15:22     ` Alexander Gordeev
2023-06-27  3:45       ` Hugh Dickins
2023-06-28 19:15 ` Gerald Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=232e20a5-ef6-41cf-4073-92fcb3a01453@google.com \
    --to=hughd@google.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-s390@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox