Re: [PATCH] s390/mm: implement software dirty bits

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org, linux-s390@vger.kernel.org,
	Mel Gorman <mgorman@suse.de>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Subject: Re: [PATCH] s390/mm: implement software dirty bits
Date: Thu, 7 Feb 2013 11:18:38 -0800	[thread overview]
Message-ID: <20130207111838.27fea18f@mschwide> (raw)
In-Reply-To: <alpine.LNX.2.00.1302061504340.7256@eggly.anvils>

On Wed, 6 Feb 2013 16:20:40 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:

> Martin, I'd like to say Applauded-by: Hugh Dickins <hughd@google.com>
> but I do have one reservation: the PageDirty business you helpfully
> draw attention to in your description above.
> 
> That makes me nervous, having a PageDirty test buried down there in
> one architecture's mk_pte().  Particularly since I know the PageDirty
> handling on anon/swap pages is rather odd: it works, but it's hard to
> justify some of the SetPageDirtys (when we add to swap, AND when we
> remove from swap): partly a leftover from 2.4 days, when vmscan worked
> differently, and we had to be more careful about freeing modified pages.

I tried to solved the whole thing with arch level code only. The PageDirty
check in mk_pte is essential to avoid additional protection faults for
tmpfs/shmem. 
 
> I did a patch a year or two ago, mainly for debugging some particular
> issue by announcing "Bad page state" if ever a dirty page is freed, in
> which I had to tidy that up.  Now, I don't have any immediate intention
> to resurrect that patch, but I'm afraid that if I did, I might interfere
> with your optimization in s390's mk_pte() without realizing it.
> 
> > --- a/arch/s390/include/asm/page.h
> > +++ b/arch/s390/include/asm/page.h
> > ...
> > @@ -1152,8 +1190,13
> >  static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
> >  {
> >  	unsigned long physpage = page_to_phys(page);
> > +	pte_t __pte = mk_pte_phys(physpage, pgprot);
> >  
> > -	return mk_pte_phys(physpage, pgprot);
> > +	if ((pte_val(__pte) & _PAGE_SWW) && PageDirty(page)) {
> > +		pte_val(__pte) |= _PAGE_SWC;
> > +		pte_val(__pte) &= ~_PAGE_RO;
> > +	}
> > +	return __pte;
> >  }
> 
> Am I right to think that, once you examine the mk_pte() callsites,
> this actually would not be affecting anon pages, nor accounted file
> pages, just tmpfs/shmem or ramfs pages read-faulted into a read-write
> shared vma?  (That fits with what you say above.)  That it amounts to
> the patch below - which I think I would prefer, because it's explicit?
> (There might be one or two other places it makes a difference e.g.
> replacing a writable migration entry, but those too uncommon to matter.)

Anon page and accounted file pages won't need the mk_pte optimization,
that is there for tmpfs/shmem. We could do that in common code as well,
to make the dependency on PageDirty more obvious.

> --- 3.8-rc6/mm/memory.c	2013-01-09 19:25:05.028321379 -0800
> +++ linux/mm/memory.c	2013-02-06 15:01:17.904387877 -0800
> @@ -3338,6 +3338,10 @@ static int __do_fault(struct mm_struct *
>  				dirty_page = page;
>  				get_page(dirty_page);
>  			}
> +#ifdef CONFIG_S390
> +			else if (pte_write(entry) && PageDirty(page))
> +				pte_mkdirty(entry);
> +#endif
>  		}
>  		set_pte_at(mm, address, page_table, entry);
> 
> And then I wonder, is that something we should do on all architectures?
> On the one hand, it would save a hardware fault when and if the pte is
> dirtied later; on the other hand, it seems wrong to claim pte dirty when
> not (though I didn't find anywhere that would care).

I don't like the fact that we are adding another CONFIG_S390, if we could
pre-dirty the pte for all architectures that would be nice. It has no
ill effects for s390 to make the pte dirty, I can think of no reason
why it should hurt for other architectures.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-02-07 19:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-05 18:12 [PATCH] software dirty bits for s390 Martin Schwidefsky
2013-02-05 18:12 ` Martin Schwidefsky
2013-02-05 19:28   ` Martin Schwidefsky
2013-02-05 18:12 ` [PATCH] s390/mm: implement software dirty bits Martin Schwidefsky
2013-02-06 11:21   ` Mel Gorman
2013-02-06 18:16     ` Martin Schwidefsky
2013-02-07  0:20   ` Hugh Dickins
2013-02-07 19:18     ` Martin Schwidefsky [this message]
2013-02-11 14:27       ` Martin Schwidefsky
2013-02-11 22:08         ` Hugh Dickins
2013-02-12  8:46           ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130207111838.27fea18f@mschwide \
    --to=schwidefsky@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.