From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch <linux-arch@vger.kernel.org>
Subject: Re: Can people please check this patch out for cross-arch issues
Date: Tue, 10 Jul 2018 08:12:49 +0200 [thread overview]
Message-ID: <20180710081249.73ccaac5@mschwideX1> (raw)
In-Reply-To: <CA+55aFxWZuFNHKpE0iA+vDJHTyPofSX3BPPOt2UUB2zXSSBZHQ@mail.gmail.com>
On Mon, 9 Jul 2018 16:57:46 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> We have a funny new bugzilla entry for an issue that is 12 years old,
> and not really all that critical, but one that *can* become a problem
> for people who do very specific things.
>
> What happens is that 'fork()' will cause a re-try (with
> ERESTARTNOINTR) if a signal has come in between the point where the
> fork() started, but before we add the process to the process table.
> The reason is that the signal possibly *should* affect the new child
> process too, but it was never queued to it because we were obviously
> in the process of creating it.
>
> That's normally entirely a non-issue, and I don't think anybody ever
> imagined it could matter in practice, but apparently there are loads
> where this becomes problematic.
>
> See kernel bugzilla at
>
> https://bugzilla.kernel.org/show_bug.cgi?id=200447
>
> which has a trial balloon patch for this issue already, to at least
> limit that retry to only signals that actually might affect the child
> (ie not any random signals sent explicitly and directly only to the
> parent process).
>
> HOWEVER.
>
> The very first thing I noticed while looking at this was that one of
> the more expensive parts of the fork() retry is actually marking all
> the parent page tables read-only. Now, it's one of _many_ expensive
> parts, and it's not nearly as expensive as all the reference counting
> we do for each page, but it's actually very easy to avoid. When we
> have repeated fork() calls, there's just no point in repeatedly
> marking pages read-only.
>
> This the attached one-liner patch.
>
> The reason I'm sending it to the arch people is that while this is a
> totally trivial patch on x86 ("pte_write()" literally tests exactly
> the same bit that "pte_wrprotect()" and "ptep_set_wrprotect()"
> clears), the same is not necessarily always true on other
> architectures.
>
> Some other architectures make "ptep_set_wrprotect()" do more than just
> clear the one bit we test with "pte_write()".
>
> Honestly, I don't think it could possibly matter: if "pte_write()"
> returns false, then whatever "ptep_set_writeprotect()" does can not
> really matter (at least for a COW mapping). But I thought I'd send
> this out for comments anyway, despite how trivial it looks.
>
> So. Comments?
>
> It doesn't make a huge difference, honestly, and the real fix for the
> "retry too eagerly" is completely different, but at the same time this
> one-liner trivial fix does feel like the RightThing(tm) regardless.
s390 does software dirty-bit tracking which makes the logic for the
page protection more bit more complicated. But !pte_write() always
implies not writable.
As a follow up patch we could remove the pte_write() check in the
s390 version of ptep_set_wrprotect.
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5ab636089c60..e8f9f3cc676a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1069,8 +1069,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
{
pte_t pte = *ptep;
- if (pte_write(pte))
- ptep_xchg_lazy(mm, addr, ptep, pte_wrprotect(pte));
+ ptep_xchg_lazy(mm, addr, ptep, pte_wrprotect(pte));
}
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
prev parent reply other threads:[~2018-07-10 6:12 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-09 23:57 Can people please check this patch out for cross-arch issues Linus Torvalds
2018-07-10 4:49 ` Nicholas Piggin
2018-07-10 4:49 ` Nicholas Piggin
2018-07-10 6:12 ` Martin Schwidefsky [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180710081249.73ccaac5@mschwideX1 \
--to=schwidefsky@de.ibm.com \
--cc=linux-arch@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox