public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
To: Hugh Dickins <hughd@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	linux-s390@vger.kernel.org
Subject: Re: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async()
Date: Thu, 15 Jun 2023 14:11:32 +0200	[thread overview]
Message-ID: <20230615141132.63ac6e67@thinkpad-T15> (raw)
In-Reply-To: <fc5cd62e-d85f-36c3-ba37-db87e8b625d@google.com>

On Wed, 14 Jun 2023 14:59:33 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

[...]
> > 
> > It would be much more acceptable to simply not add back such fragments
> > to the list, and therefore risking some memory waste, than risking to
> > use an unstable mm in __tlb_remove_table(). The amount of wasted memory
> > in practice might also not be a lot, depending on whether the fragments
> > belong to the same and contiguous mapping.  
> 
> I agree that it's much better to waste a little memory (and only temporarily)
> than to freeze or corrupt.  But it's not an insoluble problem, I just didn't
> want to get into more change if there was already an answer that covers it.
> 
> I assume that the freed mm issue scared you away from testing my patch,
> so we don't know whether I got those mask conditionals right or not?

Correct, that scared me a lot :-). On the one hand, I do not feel familiar
enough with the common code logic that might need to be changed, or at
least understood, in order to judge if this is a problem and how it could
be addressed.

On the other hand, I am scared of subtle bugs that would not show
immediately, and hit us by surprise later.

Your thoughts about using RCU to free mm, in order to address this
"unstable mm" in __tlb_remove_table(), sound like a reasonable approach.
But again, with my lack of understanding, I am not sure if I can cope
with that.

So at least be prepared for call backs on that issue, not by RCU but
by mail :-)

> 
> > 
> > Also, we would not need to use page->pt_mm, and therefore make room for
> > page->pt_frag_refcount, which for some reason is (still) being used
> > in new v4 from Vishals "Split ptdesc from struct page" series...  
> 
> Vishal's ptdesc: I've been ignoring as far as possible, I'll have to
> respond on that later today, I'm afraid it will be putting this all into
> an intolerable straitjacket.  If ptdesc is actually making more space
> available by some magic, great: but I don't expect to find that is so.
> Anyway, for now, there it's impossible (for me anyway) to think of that
> at the same time as this.

I can totally relate to that. And I also had the feeling and hope that
ptdesc would give some relief on complex struct page (mis-)use, but did
not yet get into investigating further.

[...]
> > I dot not fully understand if / why we need the new HH bits. While working
> > on my patch it seemed to be useful for sorting out list_add/del in the
> > various cases. Here it only seems to be used for preventing double rcu_head
> > usage, is this correct, or am I missing something?  
> 
> Correct, I only needed the HH bits for avoiding double rcu_head usage (then
> implementing the second usage one the first has completed).  If you want to
> distinguish pte_free_defer() tables from page_table_free_rcu() tables, the
> HH bits would need to be considered in other places too, I imagine: that
> gets more complicated, I fear.

Yes, I have the same impression. My approach would prevent scary "unstable mm"
issues in __tlb_remove_table(), but probably introduce other subtle issue.
Or not so subtle, like potential double list_free(), as mentioned in my last
reply.

So it seems we have no completely safe approach so far, but I would agree
on going further with your approach for now. See below for patch comments.

[...]
> 
> I'm getting this reply back to you, before reviewing your patch below.
> Probably the only way I can review yours is to try it myself and compare.
> I'll look into it, once I understand c2c224932fd0.  But may have to write
> to Vishal first, or get the v2 of my series out: if only I could work out
> a safe and easy way of unbreaking s390...
> 
> Is there any chance of you trying mine?
> But please don't let it waste your time.

I have put it to some LTP tests now, and good news is that it does not show
any obvious issues. Only some deadlocks on mm->context.lock, but that can
easily be solved. Problem is that we have some users of that lock, who do
spin_lock() and not spin_lock_bh(). In the past, we had 3 different locks
in mm->context, and then combined them to use the same. Only the pagetable
list locks were taken with spin_lock_bh(), the others used spin_lock().

Of course, after combining them to use the same lock, it would have been
required to change the others to also use spin_lock_bh(), at least if there
was any good reason for using _bh in the pagetable list lock.
It seems there was not, which is why that mismatch was not causing any
issues so far, probably we had some reason which got removed in one of
the various reworks of that code...

With your patch, we do now have a reason, because __tlb_remove_table()
will usually be called in _bh context as RCU callback, and now also
takes that lock. So we also need this change (and two compile fixes,
marked below):

--- a/arch/s390/include/asm/tlbflush.h
+++ b/arch/s390/include/asm/tlbflush.h
@@ -79,12 +79,12 @@ static inline void __tlb_flush_kernel(vo
 
 static inline void __tlb_flush_mm_lazy(struct mm_struct * mm)
 {
-	spin_lock(&mm->context.lock);
+	spin_lock_bh(&mm->context.lock);
 	if (mm->context.flush_mm) {
 		mm->context.flush_mm = 0;
 		__tlb_flush_mm(mm);
 	}
-	spin_unlock(&mm->context.lock);
+	spin_unlock_bh(&mm->context.lock);
 }
 
 /*
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -102,14 +102,14 @@ struct gmap *gmap_create(struct mm_struc
 	if (!gmap)
 		return NULL;
 	gmap->mm = mm;
-	spin_lock(&mm->context.lock);
+	spin_lock_bh(&mm->context.lock);
 	list_add_rcu(&gmap->list, &mm->context.gmap_list);
 	if (list_is_singular(&mm->context.gmap_list))
 		gmap_asce = gmap->asce;
 	else
 		gmap_asce = -1UL;
 	WRITE_ONCE(mm->context.gmap_asce, gmap_asce);
-	spin_unlock(&mm->context.lock);
+	spin_unlock_bh(&mm->context.lock);
 	return gmap;
 }
 EXPORT_SYMBOL_GPL(gmap_create);
@@ -250,7 +250,7 @@ void gmap_remove(struct gmap *gmap)
 		spin_unlock(&gmap->shadow_lock);
 	}
 	/* Remove gmap from the pre-mm list */
-	spin_lock(&gmap->mm->context.lock);
+	spin_lock_bh(&gmap->mm->context.lock);
 	list_del_rcu(&gmap->list);
 	if (list_empty(&gmap->mm->context.gmap_list))
 		gmap_asce = 0;
@@ -260,7 +260,7 @@ void gmap_remove(struct gmap *gmap)
 	else
 		gmap_asce = -1UL;
 	WRITE_ONCE(gmap->mm->context.gmap_asce, gmap_asce);
-	spin_unlock(&gmap->mm->context.lock);
+	spin_unlock_bh(&gmap->mm->context.lock);
 	synchronize_rcu();
 	/* Put reference */
 	gmap_put(gmap);

These are the compile fixes:

> +static void pte_free_now1(struct rcu_read *head)

rcu_read -> rcu_head

> +{
> +	pte_free_half(head, 1);
> +}
> +
> +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
> +{
> +	unsigned int bit, mask;
> +	struct page *page;
> +
> +	page = virt_to_page(pgtable);
> +	WARN_ON_ONCE(page->pt_mm != mm);
> +	if (mm_alloc_pgste(mm)) {
> +		call_rcu(&page->rcu_head, pte_free_pgste)

Missing ";" at the end

> +		return;
> +	}

  reply	other threads:[~2023-06-15 12:11 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-29  6:11 [PATCH 00/12] mm: free retracted page table by RCU Hugh Dickins
2023-05-29  6:14 ` [PATCH 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins
2023-05-31 17:06   ` Jann Horn
2023-06-02  2:50     ` Hugh Dickins
2023-06-02 14:21       ` Jann Horn
2023-05-29  6:16 ` [PATCH 02/12] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins
2023-05-29 13:56   ` Matthew Wilcox
     [not found]   ` <ZHeg3oRljRn6wlLX@ziepe.ca>
2023-06-02  5:35     ` Hugh Dickins
2023-05-29  6:17 ` [PATCH 03/12] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins
2023-05-29  6:18 ` [PATCH 04/12] powerpc: assert_pte_locked() " Hugh Dickins
2023-05-29  6:20 ` [PATCH 05/12] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-05-29 14:02   ` Matthew Wilcox
2023-05-29 14:36     ` Hugh Dickins
2023-06-01 13:57       ` Gerald Schaefer
2023-06-02  6:38         ` Hugh Dickins
2023-06-02 14:20     ` Jason Gunthorpe
2023-06-06  3:40       ` Hugh Dickins
2023-06-06 18:23         ` Jason Gunthorpe
2023-06-06 19:03           ` Peter Xu
2023-06-06 19:08             ` Jason Gunthorpe
2023-06-07  3:49               ` Hugh Dickins
2023-05-29  6:21 ` [PATCH 06/12] sparc: " Hugh Dickins
2023-06-06  3:46   ` Hugh Dickins
2023-05-29  6:22 ` [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() Hugh Dickins
2023-06-06  5:11   ` Hugh Dickins
2023-06-06 18:39     ` Jason Gunthorpe
2023-06-08  2:46       ` Hugh Dickins
2023-06-06 19:40     ` Gerald Schaefer
2023-06-08  3:35       ` Hugh Dickins
2023-06-08 13:58         ` Jason Gunthorpe
2023-06-08 15:47         ` Gerald Schaefer
2023-06-13  6:34           ` Hugh Dickins
2023-06-14 13:30             ` Gerald Schaefer
2023-06-14 21:59               ` Hugh Dickins
2023-06-15 12:11                 ` Gerald Schaefer [this message]
2023-06-15 20:06                   ` Hugh Dickins
2023-06-16  8:38                     ` Gerald Schaefer
2023-06-15 12:34                 ` Jason Gunthorpe
2023-06-15 21:09                   ` Hugh Dickins
2023-06-16 12:35                     ` Jason Gunthorpe
2023-05-29  6:23 ` [PATCH 08/12] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins
2023-06-01 13:31   ` Jann Horn
     [not found]   ` <ZHekpAKJ05cr/GLl@ziepe.ca>
2023-06-02  6:03     ` Hugh Dickins
2023-06-02 12:15       ` Jason Gunthorpe
2023-05-29  6:25 ` [PATCH 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins
2023-05-29 23:26   ` Peter Xu
2023-05-31  0:38     ` Hugh Dickins
2023-05-31 15:34   ` Jann Horn
     [not found]     ` <ZHe0A079X9B8jWlH@x1n>
2023-05-31 22:18       ` Jann Horn
2023-06-01 14:06         ` Jason Gunthorpe
2023-06-06  6:18     ` Hugh Dickins
2023-05-29  6:26 ` [PATCH 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins
2023-05-31 17:25   ` Jann Horn
2023-06-02  5:11     ` Hugh Dickins
2023-05-29  6:28 ` [PATCH 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins
2023-05-29  6:30 ` [PATCH 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins
2023-05-31 17:59 ` [PATCH 00/12] mm: free retracted page table by RCU Jann Horn
2023-06-02  4:37   ` Hugh Dickins
2023-06-02 15:26     ` Jann Horn
2023-06-06  6:28       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230615141132.63ac6e67@thinkpad-T15 \
    --to=gerald.schaefer@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-s390@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox