From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@novell.com>
Subject: Re: [xen-unstable test] 6947: regressions -
	trouble: broken/fail/pass
Date: Mon, 02 May 2011 15:04:47 +0100
Message-ID: <4DBED61F020000780003F2D3@vpn.id2.novell.com>
References: <4DBEBFE7020000780003F29F@vpn.id2.novell.com>
	<C9E46CB7.17164%keir.xen@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <C9E46CB7.17164%keir.xen@gmail.com>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Keir Fraser <keir.xen@gmail.com>
Cc: xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

>>> On 02.05.11 at 15:14, Keir Fraser <keir.xen@gmail.com> wrote:
> On 02/05/2011 13:29, "Jan Beulich" <JBeulich@novell.com> wrote:
>=20
>>>>> On 02.05.11 at 14:19, Keir Fraser <keir.xen@gmail.com> wrote:
>>> On 02/05/2011 13:00, "Jan Beulich" <JBeulich@novell.com> wrote:
>>>=20
>>>>> (3) Restructure the interrupt code to do less work in IRQ context. =
For
>>>>> example tasklet-per-irq, and schedule on the local cpu. Protect a =
bunch of
>>>>> the PIRQ structures with a non-IRQ lock. Would increase interrupt =
latency
>>>>> if
>>>>> the local CPU is interrupted in hypervisor context. I'm not sure =
about this
>>>>> one -- I'm not that happy about the amount of work now done in =
hardirq
>>>>> context, but I'm not sure on the performance impact of deferring the =
work.
>>>>=20
>>>> I'm not inclined to make changes in this area for the purpose at hand
>>>> either (again, Linux gets away without this - would have to check how
>>>> e.g. KVM gets the TLB flushing done, or whether they don't defer
>>>> flushes like we do).
>>>=20
>>> Oh, another way would be to make lookup_slot invocations from IRQ =
context be
>>> RCU-safe. Then the radix tree updates would not have to synchronise on =
the
>>> irq_desc lock? And I believe Linux has examples of RCU-safe usage of =
radix
>>=20
>> I'm not sure - the patch doesn't introduce the locking (i.e. the
>> translation arrays used without the patch also get updated under
>> lock). I'm also not certain about slot recycling aspects (i.e. what
>> would the result be if freeing slots got deferred via RCU, but the
>> same slot is then needed to be used again before the grace period
>> expires). Quite possibly this consideration is mute, just resulting
>> from my only half-baked understanding of RCU...
>=20
> The most straightforward way to convert to RCU with the most similar
> synchronising semantics would be to add a 'live' boolean flag to each
> pirq-related struct that is stored in a radix tree. Then:
>  * insertions into radix tree would be moved before acquisition of the
> irq_desc lock, then set 'live' under the lock
>  * deletions would clear 'live' under the lock, then do the actual radix
> deletion would happen after irq_desc lock release;
>  * lookups would happen as usual under the irq_desc lock, but with an =
extra
> test of the 'live' flag.

This still leaves unclear to me how an insert hitting a not-yet-deleted
(but no longer live) entry should behave. Simply setting 'live' again
won't help, as that wouldn't cover a delete->insert->delete all
happening before the first delete's grace period expires. Nor would
this work with populating the new data (prior to setting live) when
the old data might sill be used.

But wait - what you describe doesn't need RCU anymore, at least
as long as the code paths from radix tree insert to setting 'live'
(and similarly from clearing 'live' to doing the radix tree delete) are
fully covered by some other lock (d->event_lock, see below). Am
I overlooking something?

> The main complexity of this approach would probably be in breaking up =
the
> insertions/deletions across the irq_desc-lock critical section. =
Basically
> the 'live' flag update would happen wherever the insertion/deletion =
happens
> right now, but the physical insertion/deletion would be moved respectivel=
y
> earlier/later.
>=20
> We'd probably also need an extra lock to protect against concurrent
> radix-tree update operations (should be pretty straightforward to add
> however, needing to protect *only* the radix-tree update calls).

That would seem to naturally be d->event_lock (I think [almost?]
every code path changed is already protected by it).

Jan