From: David Miller <davem@davemloft.net>
To: rusty@rustcorp.com.au
Cc: netdev@vger.kernel.org, shemminger@linux-foundation.org,
jgarzik@pobox.com, hadi@cyberus.ca
Subject: Re: [PATCH RFX]: napi_struct V3
Date: Tue, 24 Jul 2007 18:47:59 -0700 (PDT) [thread overview]
Message-ID: <20070724.184759.111203672.davem@davemloft.net> (raw)
In-Reply-To: <1185326149.1803.421.camel@localhost.localdomain>
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Wed, 25 Jul 2007 11:15:49 +1000
> If I understand correctly, you're looking at a general model like the
> following:
>
> while (more_packets()) { ... netif_receive_skb() }
>
> enable_rx_and_rxnobuf_ints();
>
> /* Lock protects against race w/ rx interrupt re-queueing us */
> spin_lock_irq();
> if (!more_packets())
> netif_rx_complete(dev);
> else
> /* We'll be scheduled again. */
> disable_rx_and_rxnobuff_ints();
> spin_unlock_irq();
>
> Seems pretty robust to me. The race is probably pretty unusual, so the
> only downside is the locking overhead? Even non-irq-problematic drivers
> could use this (ie. virt_net.c probably wants to do it even though
> virtio implementation may not have this issue).
Yes, interesting cases come up in the existing virtual drivers.
They do no locking at all :-) EHEA is another case, in fact
EHEA doesn't disable interrupts at all.
The reason is that EHEA has several NAPI sources behind one single
hardware interrupt.
That driver's issues were discussed long ago and actually were the
initial impetus for Stephen to cut the first napi_struct patch.
Because of that weird layout EHEA needs special consideration, which I
will get back to.
Anyways, here is how ibmveth.c looks right now in my tree:
static int ibmveth_poll(struct napi_struct *napi, int budget)
{
struct ibmveth_adapter *adapter = container_of(napi, struct ibmveth_adapter, napi);
struct net_device *netdev = adapter->netdev;
int frames_processed = 0;
unsigned long lpar_rc;
do {
struct sk_buff *skb;
if (!ibmveth_rxq_pending_buffer(adapter))
break;
rmb();
if (!ibmveth_rxq_buffer_valid(adapter)) {
...
} else {
...
frames_processed++;
...
}
} while (frames_processed < budget);
ibmveth_replenish_task(adapter);
if (frames_processed < budget) {
/* We think we are done - reenable interrupts,
* then check once more to make sure we are done.
*/
spin_lock_irq(&adapter->poll_lock);
lpar_rc = h_vio_signal(adapter->vdev->unit_address,
VIO_IRQ_ENABLE);
ibmveth_assert(lpar_rc == H_SUCCESS);
if (ibmveth_rxq_pending_buffer(adapter))
lpar_rc = h_vio_signal(adapter->vdev->unit_address,
VIO_IRQ_DISABLE);
else
netif_rx_complete(netdev, napi);
spin_unlock_irq(&adapter->poll_lock);
}
return frames_processed;
}
static irqreturn_t ibmveth_interrupt(int irq, void *dev_instance)
{
struct net_device *netdev = dev_instance;
struct ibmveth_adapter *adapter = netdev->priv;
unsigned long lpar_rc;
spin_lock(&adapter->poll_lock);
if (netif_rx_schedule_prep(netdev, &adapter->napi)) {
lpar_rc = h_vio_signal(adapter->vdev->unit_address, VIO_IRQ_DISABLE);
ibmveth_assert(lpar_rc == H_SUCCESS);
__netif_rx_schedule(netdev, &adapter->napi);
}
spin_unlock(&adapter->poll_lock);
return IRQ_HANDLED;
}
A part of me looks at cases like this and almost believes that
the new spinlock is even superfluous. Consider:
1) ->poll() is guarenteed single-threaded wrt. itself
2) NAPI_STATE_SCHED provides implicit synchronization,
it behaves as a lock
However the lock is necessary to synchronize the:
reenable_interrupts();
if (rx_pending())
disable_interrupts();
else
netif_rx_complete(netdev, napi);
sequence.
Any arriving interrupt, up until the moment netif_rx_complete()
is invoked, will fail the netif_rx_schedule_prep() test. So we
could miss a NAPI schedule without the lock.
One thing that's peculiar is that when netif_rx_schedule_prep()
fails, we don't disable interrupts but we also don't clear the
condition that is causing the interrupt to occur.
This seems to suggest to me that we can loop a lot, or even get stuck
completely handling the interrupt forever, when these scenerios
trigger. So at the very least we need a local IRQ disable around
the netif_rx_complete() sequence.
Perhaps the lock is avoidable somehow, who knows :)
next prev parent reply other threads:[~2007-07-25 1:48 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-24 4:07 [PATCH RFX]: napi_struct V3 David Miller
2007-07-24 4:47 ` Rusty Russell
2007-07-24 5:47 ` David Miller
2007-07-24 6:21 ` Rusty Russell
2007-07-25 0:45 ` David Miller
2007-07-25 1:15 ` Rusty Russell
2007-07-25 1:47 ` David Miller [this message]
2007-07-25 2:33 ` Rusty Russell
2007-07-25 4:29 ` David Miller
2007-07-25 5:09 ` Rusty Russell
2007-07-25 5:12 ` David Miller
2007-07-25 5:10 ` David Miller
2007-07-24 7:12 ` Francois Romieu
2007-07-24 7:33 ` Jeff Garzik
2007-07-24 7:35 ` David Miller
2007-07-28 15:21 ` Roland Dreier
2007-07-29 5:30 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070724.184759.111203672.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=hadi@cyberus.ca \
--cc=jgarzik@pobox.com \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).