* [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree
@ 2015-08-24 18:41 akpm
2015-08-24 19:46 ` Hugh Dickins
0 siblings, 1 reply; 5+ messages in thread
From: akpm @ 2015-08-24 18:41 UTC (permalink / raw)
To: hughd, andreyknvl, cesarb, dvyukov, glider, hannes, jason.low2,
kcc, mhocko, stable, vdavydov, mm-commits
The patch titled
Subject: mm: fix potential data race in SyS_swapon
has been removed from the -mm tree. Its filename was
mm-fix-potential-data-race-in-sys_swapon.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd@google.com>
Subject: mm: fix potential data race in SyS_swapon
While running KernelThreadSanitizer (ktsan) on upstream kernel with
trinity, we got a few reports from SyS_swapon, here is one of them:
Read of size 8 by thread T307 (K7621):
[< inlined >] SyS_swapon+0x3c0/0x1850 SYSC_swapon mm/swapfile.c:2395
[<ffffffff812242c0>] SyS_swapon+0x3c0/0x1850 mm/swapfile.c:2345
[<ffffffff81e97c8a>] ia32_do_call+0x1b/0x25
Looks like the swap_lock should be taken when iterating through the
swap_info array on lines 2392 - 2401: q->swap_file may be reset to NULL by
another thread before it is dereferenced for f_mapping.
But why is that iteration needed at all? Doesn't the claim_swapfile()
which follows do all that is needed to check for a duplicate entry -
FMODE_EXCL on a bdev, testing IS_SWAPFILE under i_mutex on a regfile?
Well, not quite: bd_may_claim() allows the same "holder" to claim the bdev
again, so we do need to use a different holder than "sys_swapon"; and we
should not replace appropriate -EBUSY by inappropriate -EINVAL.
Index i was reused in a cpu loop further down: renamed cpu there.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swapfile.c | 25 +++++++------------------
1 file changed, 7 insertions(+), 18 deletions(-)
diff -puN mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon mm/swapfile.c
--- a/mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon
+++ a/mm/swapfile.c
@@ -2185,11 +2185,10 @@ static int claim_swapfile(struct swap_in
if (S_ISBLK(inode->i_mode)) {
p->bdev = bdgrab(I_BDEV(inode));
error = blkdev_get(p->bdev,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL,
- sys_swapon);
+ FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
if (error < 0) {
p->bdev = NULL;
- return -EINVAL;
+ return error;
}
p->old_block_size = block_size(p->bdev);
error = set_blocksize(p->bdev, PAGE_SIZE);
@@ -2390,7 +2389,6 @@ SYSCALL_DEFINE2(swapon, const char __use
struct filename *name;
struct file *swap_file = NULL;
struct address_space *mapping;
- int i;
int prio;
int error;
union swap_header *swap_header;
@@ -2430,19 +2428,8 @@ SYSCALL_DEFINE2(swapon, const char __use
p->swap_file = swap_file;
mapping = swap_file->f_mapping;
-
- for (i = 0; i < nr_swapfiles; i++) {
- struct swap_info_struct *q = swap_info[i];
-
- if (q == p || !q->swap_file)
- continue;
- if (mapping == q->swap_file->f_mapping) {
- error = -EBUSY;
- goto bad_swap;
- }
- }
-
inode = mapping->host;
+
/* If S_ISREG(inode->i_mode) will do mutex_lock(&inode->i_mutex); */
error = claim_swapfile(p, inode);
if (unlikely(error))
@@ -2475,6 +2462,8 @@ SYSCALL_DEFINE2(swapon, const char __use
goto bad_swap;
}
if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) {
+ int cpu;
+
p->flags |= SWP_SOLIDSTATE;
/*
* select a random position to start with to help wear leveling
@@ -2493,9 +2482,9 @@ SYSCALL_DEFINE2(swapon, const char __use
error = -ENOMEM;
goto bad_swap;
}
- for_each_possible_cpu(i) {
+ for_each_possible_cpu(cpu) {
struct percpu_cluster *cluster;
- cluster = per_cpu_ptr(p->percpu_cluster, i);
+ cluster = per_cpu_ptr(p->percpu_cluster, cpu);
cluster_set_null(&cluster->index);
}
}
_
Patches currently in -mm which might be from hughd@google.com are
mm-vmscan-unlock-page-while-waiting-on-writeback.patch
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree
2015-08-24 18:41 [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree akpm
@ 2015-08-24 19:46 ` Hugh Dickins
2015-08-24 19:54 ` Andrew Morton
2015-08-28 14:50 ` Mel Gorman
0 siblings, 2 replies; 5+ messages in thread
From: Hugh Dickins @ 2015-08-24 19:46 UTC (permalink / raw)
To: akpm
Cc: Mel Gorman, hughd, andreyknvl, cesarb, dvyukov, glider, hannes,
jason.low2, kcc, mhocko, stable, vdavydov, mm-commits
Adding Mel to Cc.
On Mon, 24 Aug 2015, akpm@linux-foundation.org wrote:
>
> The patch titled
> Subject: mm: fix potential data race in SyS_swapon
> has been removed from the -mm tree. Its filename was
> mm-fix-potential-data-race-in-sys_swapon.patch
>
> This patch was dropped because it was merged into mainline or a subsystem tree
Administrative error? I don't see this merged into mainline yet,
and didn't see your usual mail when you send in a batch to Linus.
And I wouldn't want it rushed too quickly to Linus: that stable
tag is barely justified, this is a very narrow race window that
has gone unnoticed for years, and swapon requires CAP_SYS_ADMIN.
But also I spotted Mel proposing a swap-over-NFS patch in this area
on LKML last Thursday: he appeared to be relying on the loop that I
remove here, so he might want to veto this one (though can always
reinstate what he needs later, if that's how it plays out).
Hugh
>
> ------------------------------------------------------
> From: Hugh Dickins <hughd@google.com>
> Subject: mm: fix potential data race in SyS_swapon
>
> While running KernelThreadSanitizer (ktsan) on upstream kernel with
> trinity, we got a few reports from SyS_swapon, here is one of them:
>
> Read of size 8 by thread T307 (K7621):
> [< inlined >] SyS_swapon+0x3c0/0x1850 SYSC_swapon mm/swapfile.c:2395
> [<ffffffff812242c0>] SyS_swapon+0x3c0/0x1850 mm/swapfile.c:2345
> [<ffffffff81e97c8a>] ia32_do_call+0x1b/0x25
>
> Looks like the swap_lock should be taken when iterating through the
> swap_info array on lines 2392 - 2401: q->swap_file may be reset to NULL by
> another thread before it is dereferenced for f_mapping.
>
> But why is that iteration needed at all? Doesn't the claim_swapfile()
> which follows do all that is needed to check for a duplicate entry -
> FMODE_EXCL on a bdev, testing IS_SWAPFILE under i_mutex on a regfile?
>
> Well, not quite: bd_may_claim() allows the same "holder" to claim the bdev
> again, so we do need to use a different holder than "sys_swapon"; and we
> should not replace appropriate -EBUSY by inappropriate -EINVAL.
>
> Index i was reused in a cpu loop further down: renamed cpu there.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov@parallels.com>
> Cc: Jason Low <jason.low2@hp.com>
> Cc: Cesar Eduardo Barros <cesarb@cesarb.net>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/swapfile.c | 25 +++++++------------------
> 1 file changed, 7 insertions(+), 18 deletions(-)
>
> diff -puN mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon mm/swapfile.c
> --- a/mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon
> +++ a/mm/swapfile.c
> @@ -2185,11 +2185,10 @@ static int claim_swapfile(struct swap_in
> if (S_ISBLK(inode->i_mode)) {
> p->bdev = bdgrab(I_BDEV(inode));
> error = blkdev_get(p->bdev,
> - FMODE_READ | FMODE_WRITE | FMODE_EXCL,
> - sys_swapon);
> + FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
> if (error < 0) {
> p->bdev = NULL;
> - return -EINVAL;
> + return error;
> }
> p->old_block_size = block_size(p->bdev);
> error = set_blocksize(p->bdev, PAGE_SIZE);
> @@ -2390,7 +2389,6 @@ SYSCALL_DEFINE2(swapon, const char __use
> struct filename *name;
> struct file *swap_file = NULL;
> struct address_space *mapping;
> - int i;
> int prio;
> int error;
> union swap_header *swap_header;
> @@ -2430,19 +2428,8 @@ SYSCALL_DEFINE2(swapon, const char __use
>
> p->swap_file = swap_file;
> mapping = swap_file->f_mapping;
> -
> - for (i = 0; i < nr_swapfiles; i++) {
> - struct swap_info_struct *q = swap_info[i];
> -
> - if (q == p || !q->swap_file)
> - continue;
> - if (mapping == q->swap_file->f_mapping) {
> - error = -EBUSY;
> - goto bad_swap;
> - }
> - }
> -
> inode = mapping->host;
> +
> /* If S_ISREG(inode->i_mode) will do mutex_lock(&inode->i_mutex); */
> error = claim_swapfile(p, inode);
> if (unlikely(error))
> @@ -2475,6 +2462,8 @@ SYSCALL_DEFINE2(swapon, const char __use
> goto bad_swap;
> }
> if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) {
> + int cpu;
> +
> p->flags |= SWP_SOLIDSTATE;
> /*
> * select a random position to start with to help wear leveling
> @@ -2493,9 +2482,9 @@ SYSCALL_DEFINE2(swapon, const char __use
> error = -ENOMEM;
> goto bad_swap;
> }
> - for_each_possible_cpu(i) {
> + for_each_possible_cpu(cpu) {
> struct percpu_cluster *cluster;
> - cluster = per_cpu_ptr(p->percpu_cluster, i);
> + cluster = per_cpu_ptr(p->percpu_cluster, cpu);
> cluster_set_null(&cluster->index);
> }
> }
> _
>
> Patches currently in -mm which might be from hughd@google.com are
>
> mm-vmscan-unlock-page-while-waiting-on-writeback.patch
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree
2015-08-24 19:46 ` Hugh Dickins
@ 2015-08-24 19:54 ` Andrew Morton
2015-08-24 19:59 ` Hugh Dickins
2015-08-28 14:50 ` Mel Gorman
1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2015-08-24 19:54 UTC (permalink / raw)
To: Hugh Dickins
Cc: Mel Gorman, andreyknvl, cesarb, dvyukov, glider, hannes,
jason.low2, kcc, mhocko, stable, vdavydov, Al Viro
On Mon, 24 Aug 2015 12:46:54 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> Adding Mel to Cc.
>
> On Mon, 24 Aug 2015, akpm@linux-foundation.org wrote:
> >
> > The patch titled
> > Subject: mm: fix potential data race in SyS_swapon
> > has been removed from the -mm tree. Its filename was
> > mm-fix-potential-data-race-in-sys_swapon.patch
> >
> > This patch was dropped because it was merged into mainline or a subsystem tree
>
> Administrative error? I don't see this merged into mainline yet,
> and didn't see your usual mail when you send in a batch to Linus.
Al Viro grabbed it and put it into linux-next.
He didn't include the cc:stable. I stared at that for a while and
decided to let it all stand - as you say, the -stable backport is
marginal. swapon isn't exactly a high-frequency operation.
> And I wouldn't want it rushed too quickly to Linus: that stable
> tag is barely justified, this is a very narrow race window that
> has gone unnoticed for years, and swapon requires CAP_SYS_ADMIN.
>
> But also I spotted Mel proposing a swap-over-NFS patch in this area
> on LKML last Thursday: he appeared to be relying on the loop that I
> remove here, so he might want to veto this one (though can always
> reinstate what he needs later, if that's how it plays out).
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree
2015-08-24 19:54 ` Andrew Morton
@ 2015-08-24 19:59 ` Hugh Dickins
0 siblings, 0 replies; 5+ messages in thread
From: Hugh Dickins @ 2015-08-24 19:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Mel Gorman, andreyknvl, cesarb, dvyukov, glider,
hannes, jason.low2, kcc, mhocko, stable, vdavydov, Al Viro
On Mon, 24 Aug 2015, Andrew Morton wrote:
> On Mon, 24 Aug 2015 12:46:54 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
>
> > Adding Mel to Cc.
> >
> > On Mon, 24 Aug 2015, akpm@linux-foundation.org wrote:
> > >
> > > The patch titled
> > > Subject: mm: fix potential data race in SyS_swapon
> > > has been removed from the -mm tree. Its filename was
> > > mm-fix-potential-data-race-in-sys_swapon.patch
> > >
> > > This patch was dropped because it was merged into mainline or a subsystem tree
> >
> > Administrative error? I don't see this merged into mainline yet,
> > and didn't see your usual mail when you send in a batch to Linus.
>
> Al Viro grabbed it and put it into linux-next.
>
> He didn't include the cc:stable. I stared at that for a while and
> decided to let it all stand - as you say, the -stable backport is
> marginal. swapon isn't exactly a high-frequency operation.
Ah, thanks for the explanation, that makes sense: yes,
I Cc'ed Al on the original, as it is as much in his area as in ours.
Hugh
>
> > And I wouldn't want it rushed too quickly to Linus: that stable
> > tag is barely justified, this is a very narrow race window that
> > has gone unnoticed for years, and swapon requires CAP_SYS_ADMIN.
> >
> > But also I spotted Mel proposing a swap-over-NFS patch in this area
> > on LKML last Thursday: he appeared to be relying on the loop that I
> > remove here, so he might want to veto this one (though can always
> > reinstate what he needs later, if that's how it plays out).
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree
2015-08-24 19:46 ` Hugh Dickins
2015-08-24 19:54 ` Andrew Morton
@ 2015-08-28 14:50 ` Mel Gorman
1 sibling, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2015-08-28 14:50 UTC (permalink / raw)
To: Hugh Dickins
Cc: akpm, andreyknvl, cesarb, dvyukov, glider, hannes, jason.low2,
kcc, mhocko, stable, vdavydov, mm-commits
On Mon, Aug 24, 2015 at 12:46:54PM -0700, Hugh Dickins wrote:
> Adding Mel to Cc.
>
> On Mon, 24 Aug 2015, akpm@linux-foundation.org wrote:
> >
> > The patch titled
> > Subject: mm: fix potential data race in SyS_swapon
> > has been removed from the -mm tree. Its filename was
> > mm-fix-potential-data-race-in-sys_swapon.patch
> >
> > This patch was dropped because it was merged into mainline or a subsystem tree
>
> Administrative error? I don't see this merged into mainline yet,
> and didn't see your usual mail when you send in a batch to Linus.
>
> And I wouldn't want it rushed too quickly to Linus: that stable
> tag is barely justified, this is a very narrow race window that
> has gone unnoticed for years, and swapon requires CAP_SYS_ADMIN.
>
> But also I spotted Mel proposing a swap-over-NFS patch in this area
> on LKML last Thursday: he appeared to be relying on the loop that I
> remove here, so he might want to veto this one (though can always
> reinstate what he needs later, if that's how it plays out).
>
I don't think we will have a problem. The swap-over-NFS patch collides
with yours but not in a way that matters. I'll see how things look
after the merge window but I think I'll be able to limit the scope of
the lock further and still avoid the use of i_mutex.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-28 14:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-24 18:41 [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree akpm
2015-08-24 19:46 ` Hugh Dickins
2015-08-24 19:54 ` Andrew Morton
2015-08-24 19:59 ` Hugh Dickins
2015-08-28 14:50 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).