public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Gregory Haskins <gregory.haskins@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	avi@redhat.com, "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH 4/4] Convert irq notifiers lists to RCU locking.
Date: Mon, 13 Jul 2009 17:08:49 +0300	[thread overview]
Message-ID: <20090713140849.GQ28046@redhat.com> (raw)
In-Reply-To: <4A5B3E65.8060309@gmail.com>

On Mon, Jul 13, 2009 at 10:02:13AM -0400, Gregory Haskins wrote:
> Gleb Natapov wrote:
> > On Mon, Jul 13, 2009 at 09:40:01AM -0400, Gregory Haskins wrote:
> >   
> >> Gleb Natapov wrote:
> >>     
> >>> On Mon, Jul 13, 2009 at 09:26:21AM -0400, Gregory Haskins wrote:
> >>>   
> >>>       
> >>>> Gleb Natapov wrote:
> >>>>     
> >>>>         
> >>>>> On Mon, Jul 13, 2009 at 04:02:56PM +0300, Michael S. Tsirkin wrote:
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>>> On Sun, Jul 12, 2009 at 03:03:53PM +0300, Gleb Natapov wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> Use RCU locking for mask/ack notifiers lists.
> >>>>>>>
> >>>>>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> >>>>>>> ---
> >>>>>>>  virt/kvm/irq_comm.c |   20 +++++++++++---------
> >>>>>>>  1 files changed, 11 insertions(+), 9 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> >>>>>>> index 5dde1ef..ba3a115 100644
> >>>>>>> --- a/virt/kvm/irq_comm.c
> >>>>>>> +++ b/virt/kvm/irq_comm.c
> >>>>>>> @@ -179,18 +179,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> >>>>>>>  			break;
> >>>>>>>  		}
> >>>>>>>  	}
> >>>>>>> -	rcu_read_unlock();
> >>>>>>>  
> >>>>>>> -	hlist_for_each_entry(kian, n, &kvm->irq_ack_notifier_list, link)
> >>>>>>> +	hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list, link)
> >>>>>>>  		if (kian->gsi == gsi)
> >>>>>>>  			kian->irq_acked(kian);
> >>>>>>> +	rcu_read_unlock();
> >>>>>>>  }
> >>>>>>>  
> >>>>>>>  void kvm_register_irq_ack_notifier(struct kvm *kvm,
> >>>>>>>  				   struct kvm_irq_ack_notifier *kian)
> >>>>>>>  {
> >>>>>>>  	mutex_lock(&kvm->irq_lock);
> >>>>>>> -	hlist_add_head(&kian->link, &kvm->irq_ack_notifier_list);
> >>>>>>> +	hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
> >>>>>>>  	mutex_unlock(&kvm->irq_lock);
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> @@ -198,8 +198,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> >>>>>>>  				    struct kvm_irq_ack_notifier *kian)
> >>>>>>>  {
> >>>>>>>  	mutex_lock(&kvm->irq_lock);
> >>>>>>> -	hlist_del_init(&kian->link);
> >>>>>>> +	hlist_del_init_rcu(&kian->link);
> >>>>>>>  	mutex_unlock(&kvm->irq_lock);
> >>>>>>> +	synchronize_rcu();
> >>>>>>>  }
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> This is done under kvm->lock still, which means the lock might be held
> >>>>>> potentially for a very long time. Can synchronize_rcu be moved out of
> >>>>>> this lock?
> >>>>>>
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>> Only if kvm_free_assigned_device() will be moved out of this lock.
> >>>>> Device de-assignment is not very frequent event though. How long do you
> >>>>> think it may be held? KVM RCU read sections are very brief.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> Note that the delay imposed by the barrier is not only related to the
> >>>> length of the critical section.  The barrier blocks until the next grace
> >>>> period, and depending on the type of RCU you are using and your config
> >>>> options, this could be multiple milliseconds.
> >>>>
> >>>> I am not saying that this is definitely a problem for your design.   I
> >>>> am just pointing out that the length of the KVM-RCU read section is only
> >>>>     
> >>>>         
> >>> Yeah I understand that other RCU read section may introduce delays too.
> >>> The question is how big the delay may be.
> >>>       
> >> I think you are misunderstanding me.  The read-side CS is not a
> >> significant factor here so I am not worried about concurrent read-side
> >> CS causing a longer delay.  What I am saying is that the grace period of
> >> your RCU subsystem is the dominant factor in the equation here, and this
> >> may be several milliseconds.
> >>
> >>     
> > How is the "grace period" is determined? Isn't it just means "no cpus is
> > in RCU read section anymore"?
> >   
> 
> Nope ;)
> 
Now I recall something about each CPU passing scheduler. Thanks.

> RCU is pretty complex, so I won't even try to explain it here as there
> are numerous articles floating around out there that do a much better job.
> 
> But here is a summary:  RCU buys you two things: 1) concurrent readers
> *and* writers, and 2) a much lower overhead reader path because it
> generally doesn't use atomic.  Its point (2) that is relevant here.
> 
> If taking an atomic were ok, you could approximate the RCU model using
> reference counting.  Reference counting buys you "precise" resource
> acquistion/release at the expense of the overhead of the atomic
> operation (and any associated cache-line bouncing).  RCU uses a
> "imprecise" model where we don't really know the *exact* moment the
> resource is released.  Instead, there are specific boundaries in time
> when we can guarantee that it had to have been released prior to the
> expiry of the event.  This event is what is called the "grace period".
> 
> So that is what synchronize_rcu() is doing.  Its a barrier to the next
> imprecise moment in time when we can be assured (if you used the rest of
> the RCU API properly) that there can not be any outstanding references
> to your object left in flight.  Each grace period can be milliseconds,
> depending on what version of the kernel you have and how it is configured.
> 
> HTH
> 
> Kind Regards,
> -Greg
> 
> >   
> >>>  I don't think multiple
> >>> milliseconds delay in device de-assignment is a big issue though.
> >>>   
> >>>       
> >> I would tend to agree with you.  It's not fast path.
> >>
> >> I only brought this up because I saw your design being justified
> >> incorrectly:  you said "KVM RCU read sections are very brief", but that
> >> is not really relevant to Michael's point.  I just want to make sure
> >> that the true impact is understood.
> >>
> >> Kind Regards,
> >> -Greg
> >>
> >>
> >>     
> >
> >
> >
> > --
> > 			Gleb.
> >   
> 
> 



--
			Gleb.

  reply	other threads:[~2009-07-13 14:08 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-12 12:03 [PATCH 0/4] moving irq routing and notifiers to RCU locking Gleb Natapov
2009-07-12 12:03 ` [PATCH 1/4] Move irq routing data structure to rcu locking Gleb Natapov
2009-07-13 12:55   ` Michael S. Tsirkin
2009-07-13 13:03     ` Gleb Natapov
2009-07-13 13:15       ` Michael S. Tsirkin
2009-07-13 13:23         ` Gleb Natapov
2009-07-13 13:36           ` Michael S. Tsirkin
2009-07-13 13:01   ` Gregory Haskins
2009-07-13 13:15     ` Gleb Natapov
2009-07-13 13:16       ` Gregory Haskins
2009-07-13 13:25         ` Gleb Natapov
2009-07-13 13:29           ` Gregory Haskins
2009-07-13 15:55       ` Marcelo Tosatti
2009-07-13 16:24         ` Gleb Natapov
2009-07-13 16:27           ` Marcelo Tosatti
2009-07-13 16:33             ` Gleb Natapov
2009-07-13 16:42               ` Marcelo Tosatti
2009-07-13 16:44                 ` Gleb Natapov
2009-07-13 16:45                   ` Marcelo Tosatti
2009-07-13 16:54                     ` Gleb Natapov
2009-07-12 12:03 ` [PATCH 2/4] Unregister ack notifier callback on PIT freeing Gleb Natapov
2009-07-12 12:03 ` [PATCH 3/4] Move irq ack notifier list to arch independent code Gleb Natapov
2009-07-12 12:03 ` [PATCH 4/4] Convert irq notifiers lists to RCU locking Gleb Natapov
2009-07-13 12:56   ` Michael S. Tsirkin
2009-07-13 13:05     ` Gleb Natapov
2009-07-13 13:29       ` Michael S. Tsirkin
2009-07-13 13:48     ` Gregory Haskins
2009-07-13 13:02   ` Michael S. Tsirkin
2009-07-13 13:11     ` Gleb Natapov
2009-07-13 13:26       ` Gregory Haskins
2009-07-13 13:32         ` Gleb Natapov
2009-07-13 13:40           ` Gregory Haskins
2009-07-13 13:52             ` Gleb Natapov
2009-07-13 14:02               ` Gregory Haskins
2009-07-13 14:08                 ` Gleb Natapov [this message]
2009-07-13 13:40           ` Michael S. Tsirkin
2009-07-13 13:44             ` Gregory Haskins
2009-07-13 19:31             ` Paul E. McKenney
2009-07-14  5:46               ` Gleb Natapov
2009-07-14 12:03                 ` Paul E. McKenney
2009-07-14 12:06                   ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090713140849.GQ28046@redhat.com \
    --to=gleb@redhat.com \
    --cc=avi@redhat.com \
    --cc=gregory.haskins@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox