From: Gregory Haskins <ghaskins@novell.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
avi@redhat.com, davidel@xmailserver.org,
paulmck@linux.vnet.ibm.com, mingo@elte.hu
Subject: Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Date: Tue, 16 Jun 2009 11:20:18 -0400 [thread overview]
Message-ID: <4A37B832.6040206@novell.com> (raw)
In-Reply-To: <20090616145502.GA1102@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5846 bytes --]
Michael S. Tsirkin wrote:
> On Tue, Jun 16, 2009 at 10:48:27AM -0400, Gregory Haskins wrote:
>
>>>>>> +static void _eventfd_notify(struct eventfd_ctx *ctx)
>>>>>> +{
>>>>>> + struct eventfd_notifier *en;
>>>>>> + int idx;
>>>>>> +
>>>>>> + idx = srcu_read_lock(&ctx->srcu);
>>>>>> +
>>>>>> + /*
>>>>>> + * The goal here is to allow the notification to be preemptible
>>>>>> + * as often as possible. We cannot achieve this with the basic
>>>>>> + * wqh mechanism because it requires the wqh->lock. Therefore
>>>>>> + * we have an internal srcu list mechanism of which the wqh is
>>>>>> + * a client.
>>>>>> + *
>>>>>> + * Not all paths will invoke this function in process context.
>>>>>> + * Callers should check for suitable state before assuming they
>>>>>> + * can sleep (such as with preemptible()). Paul McKenney assures
>>>>>> + * me that srcu_read_lock is compatible with in-atomic, as long as
>>>>>> + * the code within the critical section is also compatible.
>>>>>> + */
>>>>>> + list_for_each_entry_rcu(en, &ctx->nh, list)
>>>>>> + en->ops->signal(en);
>>>>>> +
>>>>>> + srcu_read_unlock(&ctx->srcu, idx);
>>>>>> +}
>>>>>> +
>>>>>> /*
>>>>>> * Adds "n" to the eventfd counter "count". Returns "n" in case of
>>>>>> * success, or a value lower then "n" in case of coutner overflow.
>>>>>>
>>>>>>
>>>>>>
>>>>> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false.
>>>>>
>>>>> Further, to do useful things it might not be enough that you can sleep:
>>>>> with iofd you also want to access current task with e.g. copy from user.
>>>>>
>>>>> Here's an idea: let's pass a flag to ->signal, along the lines of
>>>>> signal_is_task, that tells us that it is safe to use current, and add
>>>>> eventfd_signal_task() which is the same as eventfd_signal but lets everyone
>>>>> know that it's safe to both sleep and use current->mm.
>>>>>
>>>>> Makes sense?
>>>>>
>>>>>
>>>>>
>>>> It does make sense, yes. What I am not clear on is how would eventfd
>>>> detect this state such as to populate such flags, and why cant the
>>>> ->signal() CB do the same?
>>>>
>>>> Thanks Michael,
>>>> -Greg
>>>>
>>>>
>>>>
>>> eventfd can't detect this state. But the callers know in what context they are.
>>> So the *caller* of eventfd_signal_task makes sure of this: if you are in a task,
>>> you can call eventfd_signal_task() if not, you must call eventfd_signal.
>>>
>>>
>>>
>>>
>> Hmm, this is an interesting idea, but I think it would be problematic in
>> real-world applications for the long-term. For instance, the -rt tree
>> and irq-threads .config option in the process of merging into mainline
>> changes context types for established code. Therefore, what might be
>> "hardirq/softirq" logic today may execute in a kthread tomorrow.
>>
>
> That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is just
> an optimization. I think everyone not in the context of a system call or vmexit
> can just call eventfd_signal_task.
>
^^^^^^^^^^^^^^^^^^^^
I assume you meant s/eventfd_signal_task/eventfd_signal there?
>
>> I
>> think its dangerous to try to solve the problem with caller provided
>> info: the caller may be ignorant of its true state.
>>
>
> I assume this wasn't clear enough: the idea is that you only
> calls eventfd_signal_task if you know you are on a systemcall path.
> If you are ignorant of the state, call eventfd_signal.
>
Well, its not a matter of correctness. Its more for optimal
performance. If I have PCI pass-though injecting interrupts from
hardirq in mainline, clearly eventfd_signal() is proper. In -rt, the
hardirq is transparently converted to a kthread, so technically
eventfd_signal_task() would work (at least for the can_sleep() part, not
for current->mm per se). But in this case, the PCI logic would not know
it was converted to a kthread. It all happens transparently in the
low-level code and the pci code is unmodified.
In this case, your proposal would have the passthrough path invoking
irqfd with eventfd_signal(). It would therefore still shunt to a
workqueue to inject the interrupt, even though it would have been
perfectly fine to just inject it directly because taking
mutex_lock(&kvm->irq_lock) is legal. Perhaps I am over-optimizing, but
this is the scenario I am concerned about and what I was trying to
address with preemptible()/can_sleep().
I think your idea is a good one to address the current->mm portion. It
would only ever be safe to access the MM context from syscall/vmexit
context, as you point out. Therefore, I see no problem with
implementing something like iosignalfd with eventfd_signal_task().
But accessing current->mm is only a subset of the use-cases. The other
use-cases would include the ability to sleep, and possibly the ability
to address other->mm. For these latter cases, I really only need the
"can_sleep()" behavior, not the full blown "can_access_current_mm()".
Additionally, the eventfd_signal_task() data at least for iosignalfd is
superfluous: I already know that I can access current->mm by virtue of
the design.
So since I cannot use it accurately for the hardirq/threaded-irq type
case, and I don't actually need it for the iosignalfd case, I am not
sure its the right direction (at least for now). I do think it might
have merit for syscal/vmexit uses outside of iosignalfd, but I do not
see a current use-case for it so perhaps it can wait until one arises.
-Greg
>
>> IMO, the ideal
>> solution needs to be something we can detect at run-time.
>>
>> Thanks Michael,
>> -Greg
>>
>>
>
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]
next prev parent reply other threads:[~2009-06-16 15:20 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-16 2:29 [KVM-RFC PATCH 0/2] eventfd enhancements for irqfd/iosignalfd Gregory Haskins
2009-06-16 2:29 ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Gregory Haskins
2009-06-16 14:02 ` Michael S. Tsirkin
2009-06-16 14:11 ` Gregory Haskins
2009-06-16 14:38 ` Michael S. Tsirkin
2009-06-16 14:48 ` Gregory Haskins
2009-06-16 14:54 ` Gregory Haskins
2009-06-16 15:16 ` Michael S. Tsirkin
2009-06-16 14:55 ` Michael S. Tsirkin
2009-06-16 15:20 ` Gregory Haskins [this message]
2009-06-16 15:41 ` Michael S. Tsirkin
2009-06-16 16:17 ` Gregory Haskins
2009-06-16 16:19 ` Davide Libenzi
2009-06-16 17:01 ` Gregory Haskins
2009-06-17 16:38 ` Davide Libenzi
2009-06-17 17:28 ` Gregory Haskins
2009-06-17 17:44 ` Davide Libenzi
2009-06-17 19:17 ` Gregory Haskins
2009-06-17 19:50 ` Davide Libenzi
2009-06-17 21:48 ` Gregory Haskins
2009-06-17 23:21 ` Davide Libenzi
2009-06-18 6:23 ` Michael S. Tsirkin
2009-06-18 17:52 ` Davide Libenzi
2009-06-18 14:01 ` Gregory Haskins
2009-06-18 14:01 ` Gregory Haskins
2009-06-18 17:44 ` Davide Libenzi
2009-06-18 19:04 ` Gregory Haskins
2009-06-18 19:04 ` Gregory Haskins
2009-06-18 22:03 ` Davide Libenzi
2009-06-18 22:47 ` Gregory Haskins
2009-06-18 22:47 ` Gregory Haskins
2009-06-19 18:51 ` Gregory Haskins
2009-06-19 18:51 ` [PATCH 1/3] eventfd: Allow waiters to be notified about the eventfd file* going away Gregory Haskins
2009-06-19 18:51 ` [PATCH 2/3] eventfd: add generalized notifier interface Gregory Haskins
2009-06-19 18:51 ` [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions Gregory Haskins
2009-06-19 19:10 ` Davide Libenzi
2009-06-19 21:16 ` Gregory Haskins
2009-06-19 21:26 ` Davide Libenzi
2009-06-19 21:49 ` Gregory Haskins
2009-06-19 21:54 ` Davide Libenzi
2009-06-19 22:47 ` Davide Libenzi
2009-06-20 2:09 ` Gregory Haskins
2009-06-20 21:17 ` Davide Libenzi
2009-06-20 22:11 ` Davide Libenzi
2009-06-20 23:48 ` Davide Libenzi
2009-06-21 1:14 ` Gregory Haskins
2009-06-21 16:51 ` Davide Libenzi
2009-06-21 18:39 ` Gregory Haskins
2009-06-21 23:54 ` Davide Libenzi
2009-06-22 16:05 ` Gregory Haskins
2009-06-22 16:05 ` Gregory Haskins
2009-06-22 17:01 ` Davide Libenzi
2009-06-22 17:43 ` Gregory Haskins
2009-06-22 17:43 ` Gregory Haskins
2009-06-22 18:03 ` Davide Libenzi
2009-06-22 18:31 ` Gregory Haskins
2009-06-22 18:31 ` Gregory Haskins
2009-06-22 18:40 ` Davide Libenzi
2009-06-22 18:41 ` Michael S. Tsirkin
2009-06-22 18:51 ` Davide Libenzi
2009-06-22 19:05 ` Michael S. Tsirkin
2009-06-22 19:26 ` Gregory Haskins
2009-06-22 19:29 ` Davide Libenzi
2009-06-22 20:06 ` Gregory Haskins
2009-06-22 22:53 ` Davide Libenzi
2009-06-23 1:03 ` Gregory Haskins
2009-06-23 1:17 ` Davide Libenzi
2009-06-23 1:26 ` Gregory Haskins
2009-06-23 1:26 ` Gregory Haskins
2009-06-23 14:29 ` Davide Libenzi
2009-06-23 14:37 ` Gregory Haskins
2009-06-23 14:37 ` Gregory Haskins
2009-06-23 14:35 ` Davide Libenzi
2009-06-23 14:42 ` Gregory Haskins
2009-06-23 14:42 ` Gregory Haskins
2009-06-23 15:04 ` Michael S. Tsirkin
2009-06-22 20:28 ` Michael S. Tsirkin
2009-06-22 19:16 ` Gregory Haskins
2009-06-22 19:54 ` Davide Libenzi
2009-06-24 3:25 ` Rusty Russell
2009-06-24 22:45 ` Davide Libenzi
2009-06-25 11:42 ` Rusty Russell
2009-06-25 16:34 ` Davide Libenzi
2009-06-25 17:32 ` Gregory Haskins
2009-06-25 18:26 ` Michael S. Tsirkin
2009-06-25 18:41 ` Gregory Haskins
2009-06-26 11:23 ` Michael S. Tsirkin
2009-06-23 3:25 ` Rusty Russell
2009-06-23 14:31 ` Davide Libenzi
2009-06-25 0:19 ` Davide Libenzi
2009-06-21 1:05 ` Gregory Haskins
2009-06-16 17:54 ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Michael S. Tsirkin
2009-06-16 18:09 ` Gregory Haskins
2009-06-17 14:45 ` Michael S. Tsirkin
2009-06-17 15:02 ` Gregory Haskins
2009-06-17 16:25 ` Michael S. Tsirkin
2009-06-17 16:41 ` Gregory Haskins
2009-06-16 14:17 ` Gregory Haskins
2009-06-16 14:22 ` Gregory Haskins
2009-06-16 14:40 ` Gregory Haskins
2009-06-16 14:46 ` Michael S. Tsirkin
2009-06-18 9:03 ` Avi Kivity
2009-06-18 11:43 ` Gregory Haskins
2009-06-16 2:30 ` [KVM-RFC PATCH 2/2] eventfd: add module reference counting support for registered notifiers Gregory Haskins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A37B832.6040206@novell.com \
--to=ghaskins@novell.com \
--cc=avi@redhat.com \
--cc=davidel@xmailserver.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mst@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.