All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: syzbot <syzbot+e58112d71f77113ddb7b@syzkaller.appspotmail.com>,
	aarcange@redhat.com, akpm@linux-foundation.org,
	christian@brauner.io, davem@davemloft.net, ebiederm@xmission.com,
	elena.reshetova@intel.com, guro@fb.com, hch@infradead.org,
	james.bottomley@hansenpartnership.com, jglisse@redhat.com,
	keescook@chromium.org, ldv@altlinux.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-parisc@vger.kernel.org, luto@amacapital.net,
	mhocko@suse.com, mingo@kernel.org, namit@vmware.com,
	peterz@infradead.org, syzkaller-bugs@googlegroups.com,
	viro@zeniv.linux.org.uk, wad@chromium.org
Subject: Re: WARNING in __mmdrop
Date: Fri, 26 Jul 2019 09:47:12 -0400	[thread overview]
Message-ID: <20190726094353-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <ada10dc9-6cab-e189-5289-6f9d3ff8fed2@redhat.com>

On Fri, Jul 26, 2019 at 08:53:18PM +0800, Jason Wang wrote:
> 
> On 2019/7/26 下午8:38, Michael S. Tsirkin wrote:
> > On Fri, Jul 26, 2019 at 08:00:58PM +0800, Jason Wang wrote:
> > > On 2019/7/26 下午7:49, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 25, 2019 at 10:25:25PM +0800, Jason Wang wrote:
> > > > > On 2019/7/25 下午9:26, Michael S. Tsirkin wrote:
> > > > > > > Exactly, and that's the reason actually I use synchronize_rcu() there.
> > > > > > > 
> > > > > > > So the concern is still the possible synchronize_expedited()?
> > > > > > I think synchronize_srcu_expedited.
> > > > > > 
> > > > > > synchronize_expedited sends lots of IPI and is bad for realtime VMs.
> > > > > > 
> > > > > > > Can I do this
> > > > > > > on through another series on top of the incoming V2?
> > > > > > > 
> > > > > > > Thanks
> > > > > > > 
> > > > > > The question is this: is this still a gain if we switch to the
> > > > > > more expensive srcu? If yes then we can keep the feature on,
> > > > > I think we only care about the cost on srcu_read_lock() which looks pretty
> > > > > tiny form my point of view. Which is basically a READ_ONCE() + WRITE_ONCE().
> > > > > 
> > > > > Of course I can benchmark to see the difference.
> > > > > 
> > > > > 
> > > > > > if not we'll put it off until next release and think
> > > > > > of better solutions. rcu->srcu is just a find and replace,
> > > > > > don't see why we need to defer that. can be a separate patch
> > > > > > for sure, but we need to know how well it works.
> > > > > I think I get here, let me try to do that in V2 and let's see the numbers.
> > > > > 
> > > > > Thanks
> > > 
> > > It looks to me for tree rcu, its srcu_read_lock() have a mb() which is too
> > > expensive for us.
> > I will try to ponder using vq lock in some way.
> > Maybe with trylock somehow ...
> 
> 
> Ok, let me retry if necessary (but I do remember I end up with deadlocks
> last try).
> 
> 
> > 
> > 
> > > If we just worry about the IPI,
> > With synchronize_rcu what I would worry about is that guest is stalled
> 
> 
> Can this synchronize_rcu() be triggered by guest? If yes, there are several
> other MMU notifiers that can block. Is vhost something special here?

Sorry, let me explain: guests (and tasks in general)
can trigger activity that will
make synchronize_rcu take a long time. Thus blocking
an mmu notifier until synchronize_rcu finishes
is a bad idea.

> 
> > because system is busy because of other guests.
> > With expedited it's the IPIs...
> > 
> 
> The current synchronize_rcu()  can force a expedited grace period:
> 
> void synchronize_rcu(void)
> {
>         ...
>         if (rcu_blocking_is_gp())
> return;
>         if (rcu_gp_is_expedited())
> synchronize_rcu_expedited();
> else
> wait_rcu_gp(call_rcu);
> }
> EXPORT_SYMBOL_GPL(synchronize_rcu);


An admin can force rcu to finish faster, trading
interrupts for responsiveness.

> 
> > > can we do something like in
> > > vhost_invalidate_vq_start()?
> > > 
> > >          if (map) {
> > >                  /* In order to avoid possible IPIs with
> > >                   * synchronize_rcu_expedited() we use call_rcu() +
> > >                   * completion.
> > > */
> > > init_completion(&c.completion);
> > >                  call_rcu(&c.rcu_head, vhost_finish_vq_invalidation);
> > > wait_for_completion(&c.completion);
> > >                  vhost_set_map_dirty(vq, map, index);
> > > vhost_map_unprefetch(map);
> > >          }
> > > 
> > > ?
> > Why would that be faster than synchronize_rcu?
> 
> 
> No faster but no IPI.
> 

Sorry I still don't see the point.
synchronize_rcu doesn't normally do an IPI either.


> > 
> > 
> > > > There's one other thing that bothers me, and that is that
> > > > for large rings which are not physically contiguous
> > > > we don't implement the optimization.
> > > > 
> > > > For sure, that can wait, but I think eventually we should
> > > > vmap large rings.
> > > 
> > > Yes, worth to try. But using direct map has its own advantage: it can use
> > > hugepage that vmap can't
> > > 
> > > Thanks
> > Sure, so we can do that for small rings.
> 
> 
> Yes, that's possible but should be done on top.
> 
> Thanks

Absolutely. Need to fix up the bugs first.

-- 
MST

WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: mhocko@suse.com, peterz@infradead.org, ldv@altlinux.org,
	james.bottomley@hansenpartnership.com, linux-mm@kvack.org,
	namit@vmware.com, mingo@kernel.org, elena.reshetova@intel.com,
	keescook@chromium.org, aarcange@redhat.com, davem@davemloft.net,
	hch@infradead.org, christian@brauner.io,
	syzbot <syzbot+e58112d71f77113ddb7b@syzkaller.appspotmail.com>,
	syzkaller-bugs@googlegroups.com, jglisse@redhat.com,
	viro@zeniv.linux.org.uk, linux-arm-kernel@lists.infradead.org,
	wad@chromium.org, linux-parisc@vger.kernel.org,
	linux-kernel@vger.kernel.org, luto@amacapital.net,
	ebiederm@xmission.com, akpm@linux-foundation.org, guro@fb.com
Subject: Re: WARNING in __mmdrop
Date: Fri, 26 Jul 2019 09:47:12 -0400	[thread overview]
Message-ID: <20190726094353-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <ada10dc9-6cab-e189-5289-6f9d3ff8fed2@redhat.com>

On Fri, Jul 26, 2019 at 08:53:18PM +0800, Jason Wang wrote:
> 
> On 2019/7/26 下午8:38, Michael S. Tsirkin wrote:
> > On Fri, Jul 26, 2019 at 08:00:58PM +0800, Jason Wang wrote:
> > > On 2019/7/26 下午7:49, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 25, 2019 at 10:25:25PM +0800, Jason Wang wrote:
> > > > > On 2019/7/25 下午9:26, Michael S. Tsirkin wrote:
> > > > > > > Exactly, and that's the reason actually I use synchronize_rcu() there.
> > > > > > > 
> > > > > > > So the concern is still the possible synchronize_expedited()?
> > > > > > I think synchronize_srcu_expedited.
> > > > > > 
> > > > > > synchronize_expedited sends lots of IPI and is bad for realtime VMs.
> > > > > > 
> > > > > > > Can I do this
> > > > > > > on through another series on top of the incoming V2?
> > > > > > > 
> > > > > > > Thanks
> > > > > > > 
> > > > > > The question is this: is this still a gain if we switch to the
> > > > > > more expensive srcu? If yes then we can keep the feature on,
> > > > > I think we only care about the cost on srcu_read_lock() which looks pretty
> > > > > tiny form my point of view. Which is basically a READ_ONCE() + WRITE_ONCE().
> > > > > 
> > > > > Of course I can benchmark to see the difference.
> > > > > 
> > > > > 
> > > > > > if not we'll put it off until next release and think
> > > > > > of better solutions. rcu->srcu is just a find and replace,
> > > > > > don't see why we need to defer that. can be a separate patch
> > > > > > for sure, but we need to know how well it works.
> > > > > I think I get here, let me try to do that in V2 and let's see the numbers.
> > > > > 
> > > > > Thanks
> > > 
> > > It looks to me for tree rcu, its srcu_read_lock() have a mb() which is too
> > > expensive for us.
> > I will try to ponder using vq lock in some way.
> > Maybe with trylock somehow ...
> 
> 
> Ok, let me retry if necessary (but I do remember I end up with deadlocks
> last try).
> 
> 
> > 
> > 
> > > If we just worry about the IPI,
> > With synchronize_rcu what I would worry about is that guest is stalled
> 
> 
> Can this synchronize_rcu() be triggered by guest? If yes, there are several
> other MMU notifiers that can block. Is vhost something special here?

Sorry, let me explain: guests (and tasks in general)
can trigger activity that will
make synchronize_rcu take a long time. Thus blocking
an mmu notifier until synchronize_rcu finishes
is a bad idea.

> 
> > because system is busy because of other guests.
> > With expedited it's the IPIs...
> > 
> 
> The current synchronize_rcu()  can force a expedited grace period:
> 
> void synchronize_rcu(void)
> {
>         ...
>         if (rcu_blocking_is_gp())
> return;
>         if (rcu_gp_is_expedited())
> synchronize_rcu_expedited();
> else
> wait_rcu_gp(call_rcu);
> }
> EXPORT_SYMBOL_GPL(synchronize_rcu);


An admin can force rcu to finish faster, trading
interrupts for responsiveness.

> 
> > > can we do something like in
> > > vhost_invalidate_vq_start()?
> > > 
> > >          if (map) {
> > >                  /* In order to avoid possible IPIs with
> > >                   * synchronize_rcu_expedited() we use call_rcu() +
> > >                   * completion.
> > > */
> > > init_completion(&c.completion);
> > >                  call_rcu(&c.rcu_head, vhost_finish_vq_invalidation);
> > > wait_for_completion(&c.completion);
> > >                  vhost_set_map_dirty(vq, map, index);
> > > vhost_map_unprefetch(map);
> > >          }
> > > 
> > > ?
> > Why would that be faster than synchronize_rcu?
> 
> 
> No faster but no IPI.
> 

Sorry I still don't see the point.
synchronize_rcu doesn't normally do an IPI either.


> > 
> > 
> > > > There's one other thing that bothers me, and that is that
> > > > for large rings which are not physically contiguous
> > > > we don't implement the optimization.
> > > > 
> > > > For sure, that can wait, but I think eventually we should
> > > > vmap large rings.
> > > 
> > > Yes, worth to try. But using direct map has its own advantage: it can use
> > > hugepage that vmap can't
> > > 
> > > Thanks
> > Sure, so we can do that for small rings.
> 
> 
> Yes, that's possible but should be done on top.
> 
> Thanks

Absolutely. Need to fix up the bugs first.

-- 
MST

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-07-26 13:47 UTC|newest]

Thread overview: 175+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-19  3:35 WARNING in __mmdrop syzbot
2019-07-20 10:08 ` syzbot
2019-07-20 10:08   ` syzbot
2019-07-21 10:02   ` Michael S. Tsirkin
2019-07-21 10:02     ` Michael S. Tsirkin
2019-07-21 12:18     ` Michael S. Tsirkin
2019-07-21 12:18       ` Michael S. Tsirkin
2019-07-22  5:24       ` Jason Wang
2019-07-22  5:24         ` Jason Wang
2019-07-22  8:08         ` Michael S. Tsirkin
2019-07-22  8:08           ` Michael S. Tsirkin
2019-07-23  4:01           ` Jason Wang
2019-07-23  4:01             ` Jason Wang
2019-07-23  5:01             ` Michael S. Tsirkin
2019-07-23  5:01               ` Michael S. Tsirkin
2019-07-23  5:47               ` Jason Wang
2019-07-23  5:47                 ` Jason Wang
2019-07-23  7:23                 ` Michael S. Tsirkin
2019-07-23  7:23                   ` Michael S. Tsirkin
2019-07-23  7:53                   ` Jason Wang
2019-07-23  7:53                     ` Jason Wang
2019-07-23  8:10                     ` Michael S. Tsirkin
2019-07-23  8:10                       ` Michael S. Tsirkin
2019-07-23  8:49                       ` Jason Wang
2019-07-23  8:49                         ` Jason Wang
2019-07-23  9:26                         ` Michael S. Tsirkin
2019-07-23  9:26                           ` Michael S. Tsirkin
2019-07-23 13:31                           ` Jason Wang
2019-07-23 13:31                             ` Jason Wang
2019-07-25  5:52                             ` Michael S. Tsirkin
2019-07-25  5:52                               ` Michael S. Tsirkin
2019-07-25  7:43                               ` Jason Wang
2019-07-25  7:43                                 ` Jason Wang
2019-07-25  8:28                                 ` Michael S. Tsirkin
2019-07-25  8:28                                   ` Michael S. Tsirkin
2019-07-25 13:21                                   ` Jason Wang
2019-07-25 13:21                                     ` Jason Wang
2019-07-25 13:26                                     ` Michael S. Tsirkin
2019-07-25 13:26                                       ` Michael S. Tsirkin
2019-07-25 14:25                                       ` Jason Wang
2019-07-25 14:25                                         ` Jason Wang
2019-07-26 11:49                                         ` Michael S. Tsirkin
2019-07-26 11:49                                           ` Michael S. Tsirkin
2019-07-26 12:00                                           ` Jason Wang
2019-07-26 12:00                                             ` Jason Wang
2019-07-26 12:38                                             ` Michael S. Tsirkin
2019-07-26 12:38                                               ` Michael S. Tsirkin
2019-07-26 12:53                                               ` Jason Wang
2019-07-26 12:53                                                 ` Jason Wang
2019-07-26 13:36                                                 ` Jason Wang
2019-07-26 13:36                                                   ` Jason Wang
2019-07-26 13:49                                                   ` Michael S. Tsirkin
2019-07-26 13:49                                                     ` Michael S. Tsirkin
2019-07-29  5:54                                                     ` Jason Wang
2019-07-29  5:54                                                       ` Jason Wang
2019-07-29  8:59                                                       ` Michael S. Tsirkin
2019-07-29  8:59                                                         ` Michael S. Tsirkin
2019-07-29 14:24                                                         ` Jason Wang
2019-07-29 14:24                                                           ` Jason Wang
2019-07-29 14:44                                                           ` Michael S. Tsirkin
2019-07-29 14:44                                                             ` Michael S. Tsirkin
2019-07-30  7:44                                                             ` Jason Wang
2019-07-30  7:44                                                               ` Jason Wang
2019-07-30  8:03                                                               ` Jason Wang
2019-07-30  8:03                                                                 ` Jason Wang
2019-07-30 15:08                                                               ` Michael S. Tsirkin
2019-07-30 15:08                                                                 ` Michael S. Tsirkin
2019-07-31  8:49                                                                 ` Jason Wang
2019-07-31  8:49                                                                   ` Jason Wang
2019-07-31 23:00                                                                   ` Jason Gunthorpe
2019-07-31 23:00                                                                     ` Jason Gunthorpe
2019-07-26 13:47                                                 ` Michael S. Tsirkin [this message]
2019-07-26 13:47                                                   ` Michael S. Tsirkin
2019-07-26 14:00                                                   ` Jason Wang
2019-07-26 14:00                                                     ` Jason Wang
2019-07-26 14:10                                                     ` Michael S. Tsirkin
2019-07-26 14:10                                                       ` Michael S. Tsirkin
2019-07-26 15:03                                                     ` Jason Gunthorpe
2019-07-26 15:03                                                       ` Jason Gunthorpe
2019-07-29  5:56                                                       ` Jason Wang
2019-07-29  5:56                                                         ` Jason Wang
2019-07-21 12:28     ` RFC: call_rcu_outstanding (was Re: WARNING in __mmdrop) Michael S. Tsirkin
2019-07-21 12:28       ` Michael S. Tsirkin
2019-07-21 13:17       ` Paul E. McKenney
2019-07-21 13:17         ` Paul E. McKenney
2019-07-21 17:53         ` Michael S. Tsirkin
2019-07-21 17:53           ` Michael S. Tsirkin
2019-07-21 19:28           ` Paul E. McKenney
2019-07-21 19:28             ` Paul E. McKenney
2019-07-22  7:56             ` Michael S. Tsirkin
2019-07-22  7:56               ` Michael S. Tsirkin
2019-07-22 11:57               ` Paul E. McKenney
2019-07-22 11:57                 ` Paul E. McKenney
2019-07-21 21:08         ` Matthew Wilcox
2019-07-21 21:08           ` Matthew Wilcox
2019-07-21 23:31           ` Paul E. McKenney
2019-07-21 23:31             ` Paul E. McKenney
2019-07-22  7:52             ` Michael S. Tsirkin
2019-07-22  7:52               ` Michael S. Tsirkin
2019-07-22 11:51               ` Paul E. McKenney
2019-07-22 11:51                 ` Paul E. McKenney
2019-07-22 13:41                 ` Jason Gunthorpe
2019-07-22 13:41                   ` Jason Gunthorpe
2019-07-22 15:52                   ` Paul E. McKenney
2019-07-22 15:52                     ` Paul E. McKenney
2019-07-22 16:04                     ` Jason Gunthorpe
2019-07-22 16:04                       ` Jason Gunthorpe
2019-07-22 16:15                       ` Michael S. Tsirkin
2019-07-22 16:15                         ` Michael S. Tsirkin
2019-07-22 16:15                       ` Paul E. McKenney
2019-07-22 16:15                         ` Paul E. McKenney
2019-07-22 15:14             ` Joel Fernandes
2019-07-22 15:14               ` Joel Fernandes
2019-07-22 15:47               ` Michael S. Tsirkin
2019-07-22 15:47                 ` Michael S. Tsirkin
2019-07-22 15:55                 ` Paul E. McKenney
2019-07-22 15:55                   ` Paul E. McKenney
2019-07-22 16:13                   ` Michael S. Tsirkin
2019-07-22 16:13                     ` Michael S. Tsirkin
2019-07-22 16:25                     ` Paul E. McKenney
2019-07-22 16:25                       ` Paul E. McKenney
2019-07-22 16:32                       ` Michael S. Tsirkin
2019-07-22 16:32                         ` Michael S. Tsirkin
2019-07-22 18:58                         ` Paul E. McKenney
2019-07-22 18:58                           ` Paul E. McKenney
2019-07-22  5:21     ` WARNING in __mmdrop Jason Wang
2019-07-22  5:21       ` Jason Wang
2019-07-22  8:02       ` Michael S. Tsirkin
2019-07-22  8:02         ` Michael S. Tsirkin
2019-07-23  3:55         ` Jason Wang
2019-07-23  3:55           ` Jason Wang
2019-07-23  5:02           ` Michael S. Tsirkin
2019-07-23  5:02             ` Michael S. Tsirkin
2019-07-23  5:48             ` Jason Wang
2019-07-23  5:48               ` Jason Wang
2019-07-23  7:25               ` Michael S. Tsirkin
2019-07-23  7:25                 ` Michael S. Tsirkin
2019-07-23  7:55                 ` Jason Wang
2019-07-23  7:55                   ` Jason Wang
2019-07-23  7:56               ` Michael S. Tsirkin
2019-07-23  7:56                 ` Michael S. Tsirkin
2019-07-23  8:42                 ` Jason Wang
2019-07-23  8:42                   ` Jason Wang
2019-07-23 10:27                   ` Michael S. Tsirkin
2019-07-23 10:27                     ` Michael S. Tsirkin
2019-07-23 13:34                     ` Jason Wang
2019-07-23 13:34                       ` Jason Wang
2019-07-23 15:02                       ` Michael S. Tsirkin
2019-07-23 15:02                         ` Michael S. Tsirkin
2019-07-24  2:17                         ` Jason Wang
2019-07-24  2:17                           ` Jason Wang
2019-07-24  8:05                           ` Michael S. Tsirkin
2019-07-24  8:05                             ` Michael S. Tsirkin
2019-07-24 10:08                             ` Jason Wang
2019-07-24 10:08                               ` Jason Wang
2019-07-24 18:25                               ` Michael S. Tsirkin
2019-07-24 18:25                                 ` Michael S. Tsirkin
2019-07-25  3:44                                 ` Jason Wang
2019-07-25  3:44                                   ` Jason Wang
2019-07-25  5:09                                   ` Michael S. Tsirkin
2019-07-25  5:09                                     ` Michael S. Tsirkin
2019-07-24 16:53                             ` Jason Gunthorpe
2019-07-24 16:53                               ` Jason Gunthorpe
2019-07-24 18:25                               ` Michael S. Tsirkin
2019-07-24 18:25                                 ` Michael S. Tsirkin
2019-07-23 10:42                   ` Michael S. Tsirkin
2019-07-23 10:42                     ` Michael S. Tsirkin
2019-07-23 13:37                     ` Jason Wang
2019-07-23 13:37                       ` Jason Wang
2019-07-22 14:11     ` Jason Gunthorpe
2019-07-22 14:11       ` Jason Gunthorpe
2019-07-25  6:02       ` Michael S. Tsirkin
2019-07-25  6:02         ` Michael S. Tsirkin
2019-07-25  7:44         ` Jason Wang
2019-07-25  7:44           ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190726094353-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hch@infradead.org \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=jasowang@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=keescook@chromium.org \
    --cc=ldv@altlinux.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=syzbot+e58112d71f77113ddb7b@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.