Re: [Patch v5 2/5] x86/hpet: Use singe apic vector rather than irq_descs for HPET interrupts

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>, Keir Fraser <keir@xen.org>
Subject: Re: [Patch v5 2/5] x86/hpet: Use singe apic vector rather than irq_descs for HPET interrupts
Date: Fri, 22 Nov 2013 16:23:24 +0000	[thread overview]
Message-ID: <528F84FC.8000001@citrix.com> (raw)
In-Reply-To: <528F8A1A0200007800105F6E@nat28.tlf.novell.com>

On 22/11/13 15:45, Jan Beulich wrote:
>>>> On 14.11.13 at 17:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> The new logic is as follows:
>>  * A single high priority vector is allocated and uses on all cpus.
> Does this really need to be a high priority one? I'd think we'd be
> fine with the lowest priority one we can get, as we only need the
> wakeup here if nothing else gets a CPU to wake up.

Yes - absolutely.  We cannot have an HPET interrupt lower priority than
a guest line level interrupt.

Another cpu could be registered with our HPET channel to be worken up,
and we need to service them in a timely fashon.

>
>> +/* Wake up all cpus in the channel mask.  Lock should be held. */
>> +static void hpet_wake_cpus(struct hpet_event_channel *ch)
>>  {
>> -    unsigned int cpu = smp_processor_id();
>> +    cpuidle_wakeup_mwait(ch->cpumask);
>>  
>> -    if ( cpumask_test_and_clear_cpu(cpu, mask) )
>> -        raise_softirq(TIMER_SOFTIRQ);
>> -
>> -    cpuidle_wakeup_mwait(mask);
>> -
>> -    if ( !cpumask_empty(mask) )
>> -       cpumask_raise_softirq(mask, TIMER_SOFTIRQ);
>> +    if ( !cpumask_empty(ch->cpumask) )
>> +       cpumask_raise_softirq(ch->cpumask, TIMER_SOFTIRQ);
> Looks like the cpumask_empty() check isn't really needed here?

cpuidle_wakeup_mwait(mask) will only wake cpus who have set their bit in
cpuidle_mwait_flags.

There might be cpus who have registered for waking up who have not yet
set their bit in cpuidle_mwait_flags.

Out of caution, I did by best to avoid playing with the guts of the
timing code, as it seems quite fragile.

>
>> +/* HPET interrupt entry.  This is set up as a high priority vector. */
>> +static void do_hpet_irq(struct cpu_user_regs *regs)
>>  {
>> -    struct hpet_event_channel *ch = (struct hpet_event_channel *)data;
>> -
>> -    this_cpu(irq_count)--;
>> +    unsigned int cpu = smp_processor_id();
> This is being used just once, and hence rather pointless to have.

Fair point - this was left over from a previous version of the function
which did use cpu twice.

>
>> -    desc->handler = &hpet_msi_type;
>> -    ret = request_irq(ch->msi.irq, hpet_interrupt_handler, "HPET", ch);
>> -    if ( ret >= 0 )
>> -        ret = __hpet_setup_msi_irq(desc);
>> +    ret = hpet_setup_msi(ch);
> Why did you keep this? With that function now being called from
> hpet_broadcast_enter() I don't why you'd need this here (just
> like you're removing it from hpet_broadcast_resume() without
> replacement).

Because I optimised functions in the wrong order to obviously spot
this.  As we leave the MSI masked, this call to setup can be dropped.

I would however prefer not to manually inline hpet_setup_msi() into
hpet_broadcast_enter().  The compiler can do that for me, and it saves
making hpet_broadcast_enter() even more complicated.

>
>> @@ -574,6 +384,7 @@ void __init hpet_broadcast_init(void)
>>          /* Stop HPET legacy interrupts */
>>          cfg &= ~HPET_CFG_LEGACY;
>>          n = num_hpets_used;
>> +        free_channels = (1U << n) - 1;
> This is undefined when n == 32. Since n > 0, I'd suggest
>
>         free_channels = (u32)~0 >> (32 - n);

Ok

>
>> +    ch = hpet_get_free_channel();
>> +
>> +    if ( ch )
>> +    {
>> +        spin_lock(&ch->lock);
>> +
>> +        /* This really should be an MSI channel by this point */
>> +        ASSERT(!(ch->flags & HPET_EVT_LEGACY));
>> +
>> +        hpet_msi_mask(ch);
>> +
>> +        this_cpu(hpet_channel) = ch;
>> +        ch->cpu = cpu;
> I think even if benign, you should set ->cpu before storing into
> hpet_channel.

Ok

>
>> +        cpumask_set_cpu(cpu, ch->cpumask);
>> +
>> +        hpet_setup_msi(ch);
>> +        hpet_program_time(ch, deadline, NOW(), 1);
>> +        hpet_msi_unmask(ch);
>> +
>> +        spin_unlock(&ch->lock);
>> +    }
>> +    else
>> +    {
>> +        /* TODO - this seems very ugly */
> And you nevertheless want it committed this way?

Probably best the comment gets dropped.

>
>> +        s_time_t fallback_deadline = STIME_MAX;
>> +        unsigned int i, fallback_idx = -1;
>> +
>> +        for ( i = 0; i < num_hpets_used; ++i )
>> +        {
>> +            ch = &hpet_events[i];
>> +            spin_lock(&ch->lock);
>> +
>> +            if ( ch->cpu == -1 )
>> +                goto continue_search;
>> +
>> +            /* This channel is going to expire far too early */
>> +            if ( ch->next_event < deadline - MICROSECS(50) )
>> +                goto continue_search;
> So this is going to make us wake early. You remember that all this
> code exists solely for power saving purposes? Iiuc the CPU will
> then basically spin entering an idle state, until its actual wakeup
> time is reached.

What would you suggest instead?  We dont really want to be waking up late.

>
>> +
>> +            /* We can deal with being woken with 50us to spare */
>> +            if ( ch->next_event <= deadline )
>> +                break;
>> +
>> +            /* Otherwise record our best HPET to borrow. */
>> +            if ( ch->next_event <= fallback_deadline )
>> +            {
>> +                fallback_idx = i;
>> +                fallback_deadline = ch->next_event;
>> +            }
>> +
>> +        continue_search:
>> +            spin_unlock(&ch->lock);
>> +            ch = NULL;
>> +        }
>> +
>> +        if ( ch )
>> +        {
>> +            /* Found HPET with an appropriate time.  Request to be woken up */
>> +            cpumask_set_cpu(cpu, ch->cpumask);
>> +            this_cpu(hpet_channel) = ch;
>> +            spin_unlock(&ch->lock);
>> +        }
>> +        else if ( fallback_deadline < STIME_MAX && fallback_deadline != -1 )
>> +        {
>> +            /*
>> +             * Else we want to reprogram the fallback HPET sooner if possible,
>> +             * but with all the spinlocking, it just might have passed.
>> +             */
>> +            ch = &hpet_events[fallback_idx];
>> +
>> +            spin_lock(&ch->lock);
>>  
>> -    if ( !(ch->flags & HPET_EVT_LEGACY) )
>> -        hpet_attach_channel(cpu, ch);
>> +            if ( ch->cpu != -1 && ch->next_event == fallback_deadline )
> Can't this be "<= deadline", being quite a bit more flexible?

This is a test for whether ch is the one I identified as the best inside
the loop.  There should be sufficient flexibility inside the loop.

>
>> +            {
>> +                cpumask_set_cpu(cpu, ch->cpumask);
>> +                hpet_program_time(ch, deadline, NOW(), 1);
>> +            }
>> +            else
>> +                /* All else has failed.  Wander the idle loop again */
>> +                this_cpu(timer_deadline) = NOW() - 1;
>> +
>> +            spin_unlock(&ch->lock);
>> +        }
>> +        else
>> +            /* All HPETs were too soon.  Wander the idle loop again */
>> +            this_cpu(timer_deadline) = NOW() - 1;
> Here and above - what good will this do? Is this just in the
> expectation that the next time round a free channel will likely be
> found? If so, why can't you just go back to the start of the
> function.

Or a different HPET programmed to a different time which is now
compatible with our wakeup timeframe.

We cannot spin in this function, as we have interrupts disabled.

>
> Further - how do you see this going back to the idle loop? The
> way this is called from acpi/cpu_idle.c, you'll end up entering
> C3 anyway, with nothing going to wake you up. By proceeding to
> the end of the function, you even manage to disable the LAPIC
> timer.

Hmm - I will need to revisit this logic then.

>
>> +    /* If we own the channel, detach it */
>> +    if ( ch->cpu == cpu )
>> +    {
>> +        hpet_msi_mask(ch);
>> +        hpet_wake_cpus(ch);
>> +        ch->cpu = -1;
>> +        set_bit(ch->idx, &free_channels);
> Wouldn't there be less cache line bouncing if the "free" flag was just
> one of the channel flags?
>
> Jan

Yes, but then finding a free channel would involve searching each
hpet_channel rather than searching free_channels with ffs().

~Andrew

next prev parent reply	other threads:[~2013-11-22 16:23 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-13 17:59 [RFC v4 0/5] HPET fix interrupt logic Andrew Cooper
2013-11-13 17:59 ` [Patch v4 1/5] x86/hpet: Pre cleanup Andrew Cooper
2013-11-13 17:59 ` [Patch v4 2/5] x86/hpet: Use singe apic vector rather than irq_descs for HPET interrupts Andrew Cooper
2013-11-14 15:52   ` Tim Deegan
2013-11-14 15:56     ` Andrew Cooper
2013-11-14 16:01     ` [Patch v5 " Andrew Cooper
2013-11-22 15:45       ` Jan Beulich
2013-11-22 16:23         ` Andrew Cooper [this message]
2013-11-22 16:49           ` Jan Beulich
2013-11-22 17:38             ` Andrew Cooper
2013-11-25  7:52               ` Jan Beulich
2013-11-25  7:50           ` Jan Beulich
2013-11-26 18:32             ` Andrew Cooper
2013-11-27  8:35               ` Jan Beulich
2013-11-27 22:37                 ` Andrew Cooper
2013-11-28 14:33                   ` Jan Beulich
2013-11-28 15:06                     ` Andrew Cooper
2013-11-13 17:59 ` [Patch v4 3/5] x86/hpet: Post cleanup Andrew Cooper
2013-11-13 17:59 ` [Patch v4 4/5] x86/hpet: Debug and verbose hpet logging Andrew Cooper
2013-11-13 17:59 ` [Patch v4 5/5] x86/hpet: debug keyhandlers Andrew Cooper
  -- strict thread matches above, loose matches on Subject: below --
2014-03-05 15:43 [RFC v5 0/5] HPET fix interrupt logic Andrew Cooper
2014-03-05 15:43 ` [PATCH v5 2/5] x86/hpet: Use singe apic vector rather than irq_descs for HPET interrupts Andrew Cooper
2014-03-06 14:11   ` Tim Deegan
2014-03-06 14:33   ` Jan Beulich
2014-03-06 14:40     ` Andrew Cooper
2014-03-06 15:38       ` Jan Beulich
2014-03-06 16:08   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528F84FC.8000001@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.