All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Hillf Danton <dhillf@gmail.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: unused swap offset / bad page map.
Date: Mon, 26 Aug 2013 18:28:33 -0400	[thread overview]
Message-ID: <20130826222833.GA24320@redhat.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1308261448490.4982@eggly.anvils>

On Mon, Aug 26, 2013 at 03:08:45PM -0700, Hugh Dickins wrote:
 
 > > That said, google does find "swap_free: Unused swap offset entry"
 > > reports from over the years. Most of them seem to be single-bit
 > > errors, though (ie when the entry is 00000100 or similar I'm more
 > > inclined to blame a bit error
 > 
 > Yes, historically they have usually represented either single-bit
 > errors, or corruption of page tables by other kernel data.  The
 > swap subsystem discovers it, but it's rarely an error of swap.
 
Just to rule out bad hardware, I've seen this on two systems
(admittedly the exact same spec, but still..)

 > So I don't care for Dave's suggestion much earlier in this thread,
 > that swapoff should fail with -EINVAL if there has been a bad page
 > taint: that doesn't necessarily interfere with swapoff at all.
 > 
 > And besides, swapoff is killable: yes, if counts go wrong, it
 > can cycle around endlessly, but it checks for signal_pending()
 > each time around the loop.

It might be killable, but if I've done /sbin/reboot, and the
kernel dies in sys_swapoff because of the corruption, I won't
get a chance to kill it, because at that point the shutdown process
has killed my shell, sshd, and just about everything else.
It mieans a grumpy walk to the other side of the house to prod a
reset button.  So yeah, it might not be a mergable thing, but
at least while bisecting it's pretty much a must-have.

 > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's
 > a line in mremap which worries me.  That set_pte_at() is operating
 > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks
 > prone to corrupt a swap entry.
 > 
 > I've not tried matching up bits with Dave's reports, and just going
 > into a meeting now, but this patch looks worth a try: probably Cyrill
 > can improve it meanwhile to what he actually wants there (I'm
 > surprised anything special is needed for just moving a pte).
 > 
 > Hugh
 > 
 > --- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
 > +++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
 > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
 >  			continue;
 >  		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 >  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
 > -		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
 > +		set_pte_at(mm, new_addr, new_pte, pte);
 >  	}

I'll give this a shot once I'm done with the bisect.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Jones <davej@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Hillf Danton <dhillf@gmail.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: unused swap offset / bad page map.
Date: Mon, 26 Aug 2013 18:28:33 -0400	[thread overview]
Message-ID: <20130826222833.GA24320@redhat.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1308261448490.4982@eggly.anvils>

On Mon, Aug 26, 2013 at 03:08:45PM -0700, Hugh Dickins wrote:
 
 > > That said, google does find "swap_free: Unused swap offset entry"
 > > reports from over the years. Most of them seem to be single-bit
 > > errors, though (ie when the entry is 00000100 or similar I'm more
 > > inclined to blame a bit error
 > 
 > Yes, historically they have usually represented either single-bit
 > errors, or corruption of page tables by other kernel data.  The
 > swap subsystem discovers it, but it's rarely an error of swap.
 
Just to rule out bad hardware, I've seen this on two systems
(admittedly the exact same spec, but still..)

 > So I don't care for Dave's suggestion much earlier in this thread,
 > that swapoff should fail with -EINVAL if there has been a bad page
 > taint: that doesn't necessarily interfere with swapoff at all.
 > 
 > And besides, swapoff is killable: yes, if counts go wrong, it
 > can cycle around endlessly, but it checks for signal_pending()
 > each time around the loop.

It might be killable, but if I've done /sbin/reboot, and the
kernel dies in sys_swapoff because of the corruption, I won't
get a chance to kill it, because at that point the shutdown process
has killed my shell, sshd, and just about everything else.
It mieans a grumpy walk to the other side of the house to prod a
reset button.  So yeah, it might not be a mergable thing, but
at least while bisecting it's pretty much a must-have.

 > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's
 > a line in mremap which worries me.  That set_pte_at() is operating
 > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks
 > prone to corrupt a swap entry.
 > 
 > I've not tried matching up bits with Dave's reports, and just going
 > into a meeting now, but this patch looks worth a try: probably Cyrill
 > can improve it meanwhile to what he actually wants there (I'm
 > surprised anything special is needed for just moving a pte).
 > 
 > Hugh
 > 
 > --- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
 > +++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
 > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
 >  			continue;
 >  		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 >  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
 > -		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
 > +		set_pte_at(mm, new_addr, new_pte, pte);
 >  	}

I'll give this a shot once I'm done with the bisect.

	Dave


  reply	other threads:[~2013-08-26 22:28 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-07  5:51 unused swap offset / bad page map Dave Jones
2013-08-07  5:51 ` Dave Jones
2013-08-07 10:04 ` Hillf Danton
2013-08-07 10:04   ` Hillf Danton
2013-08-07 15:30   ` Dave Jones
2013-08-07 15:30     ` Dave Jones
2013-08-08 15:20     ` Hillf Danton
2013-08-08 15:20       ` Hillf Danton
2013-08-08 15:36       ` Dave Jones
2013-08-08 15:36         ` Dave Jones
2013-08-19 23:18       ` Dave Jones
2013-08-19 23:18         ` Dave Jones
2013-08-20  4:39         ` Hillf Danton
2013-08-20  4:39           ` Hillf Danton
2013-08-21 20:49           ` Dave Jones
2013-08-21 20:49             ` Dave Jones
2013-08-22  0:35             ` Hillf Danton
2013-08-22  0:35               ` Hillf Danton
2013-08-22  3:21             ` Hillf Danton
2013-08-22  3:21               ` Hillf Danton
2013-08-23  3:21               ` Dave Jones
2013-08-23  3:21                 ` Dave Jones
2013-08-23  3:27                 ` Hillf Danton
2013-08-23  3:53                   ` Dave Jones
2013-08-23  3:53                     ` Dave Jones
2013-08-26  3:45                     ` Hillf Danton
2013-08-26  3:45                       ` Hillf Danton
2013-08-26 19:08                       ` Dave Jones
2013-08-26 19:08                         ` Dave Jones
2013-08-26 20:15                         ` Linus Torvalds
2013-08-26 20:15                           ` Linus Torvalds
2013-08-26 20:46                           ` Linus Torvalds
2013-08-26 20:46                             ` Linus Torvalds
2013-08-26 22:08                             ` Hugh Dickins
2013-08-26 22:08                               ` Hugh Dickins
2013-08-26 22:28                               ` Dave Jones [this message]
2013-08-26 22:28                                 ` Dave Jones
2013-08-27  8:37                                 ` Cyrill Gorcunov
2013-08-27  8:37                                   ` Cyrill Gorcunov
2013-08-27 16:24                                   ` Dave Jones
2013-08-27 16:24                                     ` Dave Jones
2013-08-27 16:32                                     ` Cyrill Gorcunov
2013-08-27 16:32                                       ` Cyrill Gorcunov
2013-08-26 23:15                               ` Linus Torvalds
2013-08-26 23:15                                 ` Linus Torvalds
2013-08-27  5:44                                 ` Cyrill Gorcunov
2013-08-27  5:44                                   ` Cyrill Gorcunov
2013-08-26 20:18                         ` Cyrill Gorcunov
2013-08-26 20:18                           ` Cyrill Gorcunov
2013-08-26 20:37                           ` Dave Jones
2013-08-26 20:37                             ` Dave Jones
2013-08-26 20:42                             ` Cyrill Gorcunov
2013-08-26 20:42                               ` Cyrill Gorcunov
2013-08-26 21:37                               ` Cyrill Gorcunov
2013-08-26 21:37                                 ` Cyrill Gorcunov
2013-08-26 21:42                                 ` Dave Jones
2013-08-26 21:42                                   ` Dave Jones
2013-08-26 21:49                                   ` Cyrill Gorcunov
2013-08-26 21:49                                     ` Cyrill Gorcunov
2013-08-26 21:59                                     ` Dave Jones
2013-08-26 21:59                                       ` Dave Jones
2013-08-07 15:54   ` Dave Jones
2013-08-07 15:54     ` Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2013-08-23  9:08 Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130826222833.GA24320@redhat.com \
    --to=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.