* Re: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Lino Sanfilippo @ 2014-12-19 22:24 UTC (permalink / raw)
To: Raghu Vatsavayi, davem
Cc: netdev, Derek Chickles, Satanand Burla, Felix Manlunas,
Raghu Vatsavayi, Stephen Hemminger
In-Reply-To: <1418959519-31681-1-git-send-email-rvatsavayi@caviumnetworks.com>
There seem to be various places where GFP_KERNEL could be used instead
of GFP_ATOMIC, e.g.:
> +static int setup_nic_devices(struct octeon_device *octeon_dev)
> +{
> +
> + sc = kmalloc(sizeof(*sc), GFP_ATOMIC);
> + if (!sc)
> + return -ENOMEM;
> +
and
> +static int liquidio_init_nic_module(int octeon_id, void *octeon_dev)
> +{
> + /* Allocate a soft command to be used to send link status requests
> + * to the core app.
> + */
> + sc = kmalloc(sizeof(*sc), GFP_ATOMIC);
> + if (!sc) {
> + kfree(ls);
> + return -ENOMEM;
also
> +/* Configure interrupt moderation parameters */
> +static int octnet_set_intrmod_cfg(void *oct, struct oct_intrmod_cfg *intr_cfg)
> +{
> + struct octeon_soft_command *sc;
> + struct oct_intrmod_cmd *cmd;
> + int retval;
> + struct octeon_device *oct_dev = (struct octeon_device *)oct;
> + unsigned char *cfg;
> +
> + /* Alloc soft command */
> + sc = (struct octeon_soft_command *)
> + kmalloc(sizeof(struct octeon_soft_command), GFP_ATOMIC);
> + if (!sc)
> + return -ENOMEM;
seems only to be called from process context.
Regards,
Lino
^ permalink raw reply
* RE: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Chickles, Derek @ 2014-12-19 22:20 UTC (permalink / raw)
To: Lino Sanfilippo, Vatsavayi, Raghu, davem@davemloft.net
Cc: netdev@vger.kernel.org, Burla, Satananda, Manlunas, Felix,
Vatsavayi, Raghu, Stephen Hemminger
In-Reply-To: <5494A0DD.1010304@gmx.de>
> On 19.12.2014 04:25, Raghu Vatsavayi wrote:
> > +static void cn6xxx_disable_interrupt(void *chip)
> > +{
> > + struct octeon_cn6xxx *cn6xxx = (struct octeon_cn6xxx *)chip;
> > +
> > + /* Disable Interrupts */
> > + writeq(0, cn6xxx->intr_enb_reg64);
> > +}
> > +
>
> This could also be a good candidate for forced write posting. The code
> assumes that interrupts are actually deactivated after that.
>
> Regards,
> Lino
>
>
Good catch. We'll examine all the writeq and OCTEON_PCI_WIN_WRITE
calls to make sure there aren't any other critical posted write scenarios that
we've missed.
^ permalink raw reply
* RE: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Chickles, Derek @ 2014-12-19 22:39 UTC (permalink / raw)
To: Lino Sanfilippo, Vatsavayi, Raghu, davem@davemloft.net
Cc: netdev@vger.kernel.org, Burla, Satananda, Manlunas, Felix,
Vatsavayi, Raghu
In-Reply-To: <54949D24.1040707@gmx.de>
> On 19.12.2014 04:25, Raghu Vatsavayi wrote:
> > +static int setup_nic_devices(struct octeon_device *octeon_dev)
> > +{
...
> > +
> > + sc = kmalloc(sizeof(*sc), GFP_ATOMIC);
> > + if (!sc)
> > + return -ENOMEM;
> > +
> > + resp = kmalloc(sizeof(*resp), GFP_KERNEL);
> > + if (!resp) {
> > + kfree(sc);
> > + return -ENOMEM;
> > + }
...
> > + init_waitqueue_head(&resp->s.wc);
...
> > +
> > + retval = octeon_send_soft_command(octeon_dev, sc);
> > + if (retval) {
> > + lio_dev_err(octeon_dev,
> > + "Link status instruction failed status: %x\n",
> > + retval);
> > + /* Soft instr is freed by driver in case of failure. */
> > + goto setup_nic_dev_fail;
> > + }
> > +
> > + /* Sleep on a wait queue till the cond flag indicates that the
> > + * response arrived or timed-out.
> > + */
> > + sleep_cond(&resp->s.wc, &resp->s.cond);
...
> > + kfree(sc);
> > + kfree(resp);
>
> I wonder if this is correct. AFAICS sc is the same memory that we put on
> the octeon_instr_queue (see send_soft_command() ->
> send_command()->add_to_nrlist()). Is that queue cleared before sc is
> freed?
>
>
Yes, it's correct. We wait for the response before exiting this loop with
that sleep_cond() call. This adds an entry to a wait_queue that is
serviced asynchronously and will return once it is complete.
^ permalink raw reply
* Re: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Lino Sanfilippo @ 2014-12-19 22:51 UTC (permalink / raw)
To: Raghu Vatsavayi, davem
Cc: netdev, Derek Chickles, Satanand Burla, Felix Manlunas,
Raghu Vatsavayi, stephen Hemminger
In-Reply-To: <1418959519-31681-1-git-send-email-rvatsavayi@caviumnetworks.com>
On 19.12.2014 04:25, Raghu Vatsavayi wrote:
> static int
> +oct_cfg_rx_intrcnt(struct lio *lio, struct ethtool_coalesce *intr_coal)
> +{
> + }
> +
> + spin_lock_irqsave(&cn6xxx->lock_for_droq_int_enb_reg, flags);
> + octeon_write_csr(oct, OCT_SLI_REGNAME(oct, PKT_CNT_INT_ENB), intr);
> + spin_unlock_irqrestore(&cn6xxx->lock_for_droq_int_enb_reg, flags);
> +
> + return 0;
What is the reason that this is locked? If it really has to be
synchronized then there should AFAIK at least be an mmiowb() to make
sure that the write does not leak out of the lock...
Regards,
Lino
^ permalink raw reply
* RE: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Chickles, Derek @ 2014-12-19 22:53 UTC (permalink / raw)
To: Lino Sanfilippo, Vatsavayi, Raghu, davem@davemloft.net
Cc: netdev@vger.kernel.org, Burla, Satananda, Manlunas, Felix,
Vatsavayi, Raghu, Stephen Hemminger
In-Reply-To: <5494A58E.7000000@gmx.de>
> There seem to be various places where GFP_KERNEL could be used instead
> of GFP_ATOMIC, e.g.:
>
> > +static int setup_nic_devices(struct octeon_device *octeon_dev)
> > +{
>
> > +
> > + sc = kmalloc(sizeof(*sc), GFP_ATOMIC);
> > + if (!sc)
> > + return -ENOMEM;
> > +
>
> and
>
> > +static int liquidio_init_nic_module(int octeon_id, void *octeon_dev)
> > +{
>
> > + /* Allocate a soft command to be used to send link status requests
> > + * to the core app.
> > + */
> > + sc = kmalloc(sizeof(*sc), GFP_ATOMIC);
> > + if (!sc) {
> > + kfree(ls);
> > + return -ENOMEM;
>
> also
>
> > +/* Configure interrupt moderation parameters */
> > +static int octnet_set_intrmod_cfg(void *oct, struct oct_intrmod_cfg
> *intr_cfg)
> > +{
> > + struct octeon_soft_command *sc;
> > + struct oct_intrmod_cmd *cmd;
> > + int retval;
> > + struct octeon_device *oct_dev = (struct octeon_device *)oct;
> > + unsigned char *cfg;
> > +
> > + /* Alloc soft command */
> > + sc = (struct octeon_soft_command *)
> > + kmalloc(sizeof(struct octeon_soft_command), GFP_ATOMIC);
> > + if (!sc)
> > + return -ENOMEM;
>
> seems only to be called from process context.
>
> Regards,
> Lino
>
Thanks, you're right. I think also octnic_alloc_ctrl_pkt_sc() can be GFP_KERNEL.
We'll also examine the other two spots where we use GFP_ATOMIC and make
sure they are correct.
^ permalink raw reply
* [PATCH net] net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
From: Jay Vosburgh @ 2014-12-19 23:32 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller
When using VXLAN tunnels and a sky2 device, I have experienced
checksum failures of the following type:
[ 4297.761899] eth0: hw csum failure
[...]
[ 4297.765223] Call Trace:
[ 4297.765224] <IRQ> [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235] [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238] [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240] [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243] [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246] [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
These are reliably reproduced in a network topology of:
container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
When VXLAN encapsulated traffic is received from a similarly
configured peer, the above warning is generated in the receive
processing of the encapsulated packet. Note that the warning is
associated with the container eth0.
The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
because the packet is an encapsulated Ethernet frame, the checksum
generated by the hardware includes the inner protocol and Ethernet
headers.
The receive code is careful to update the skb->csum, except in
__dev_forward_skb, as called by dev_forward_skb. __dev_forward_skb
calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
to skip over the Ethernet header, but does not update skb->csum when
doing so.
This patch resolves the problem by adding a call to
skb_postpull_rcsum to update the skb->csum after the call to
eth_type_trans.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
---
Please consider for 3.17 -stable; I do not see the warning on 3.14.
net/core/dev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index f411c28..df755e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1694,6 +1694,7 @@ int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
skb_scrub_packet(skb, true);
skb->protocol = eth_type_trans(skb, dev);
+ skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
return 0;
}
--
1.9.1
^ permalink raw reply related
* virtio_net: Fix napi poll list corruption
From: Herbert Xu @ 2014-12-20 0:23 UTC (permalink / raw)
To: David Vrabel
Cc: netdev, david.vrabel, xen-devel, konrad.wilk, boris.ostrovsky,
edumazet, David S. Miller
In-Reply-To: <1418756386-6940-1-git-send-email-david.vrabel@citrix.com>
David Vrabel <david.vrabel@citrix.com> wrote:
> After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt
> masking in NAPI) the napi instance is removed from the per-cpu list
> prior to calling the n->poll(), and is only requeued if all of the
> budget was used. This inadvertently broke netfront because netfront
> does not use NAPI correctly.
A similar bug exists in virtio_net.
-- >8 --
The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
interrupt masking in NAPI) breaks virtio_net in an insidious way.
It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty. However, like some
other drivers virtio_net tries to do a last-ditch check and if
there is more work it will call napi_schedule and then immediately
process some of this new work. Should the entire budget be consumed
while processing such new work then we will violate the new caller
contract.
This patch fixes this by not touching any work when we reschedule
in virtio_net.
The worst part of this bug is that the list corruption causes other
napi users to be moved off-list. In my case I was chasing a stall
in IPsec (IPsec uses netif_rx) and I only belatedly realised that it
was virtio_net which caused the stall even though the virtio_net
poll was still functioning perfectly after IPsec stalled.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b8bd719..5ca9771 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -760,7 +760,6 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
container_of(napi, struct receive_queue, napi);
unsigned int r, received = 0;
-again:
received += virtnet_receive(rq, budget - received);
/* Out of packets? */
@@ -771,7 +770,6 @@ again:
napi_schedule_prep(napi)) {
virtqueue_disable_cb(rq->vq);
__napi_schedule(napi);
- goto again;
}
}
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* Re: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Lino Sanfilippo @ 2014-12-20 0:33 UTC (permalink / raw)
To: Raghu Vatsavayi, davem
Cc: netdev, Derek Chickles, Satanand Burla, Felix Manlunas,
Raghu Vatsavayi, Stephen Hemminger
In-Reply-To: <1418959519-31681-1-git-send-email-rvatsavayi@caviumnetworks.com>
On 19.12.2014 04:25, Raghu Vatsavayi wrote:
> +sleep_cond(wait_queue_head_t *wait_queue, int *condition)
> +{
> + wait_queue_t we;
> +
> + init_waitqueue_entry(&we, current);
> + add_wait_queue(wait_queue, &we);
> + set_current_state(TASK_INTERRUPTIBLE);
> + while (!(ACCESS_ONCE(*condition)))
> + schedule();
> + set_current_state(TASK_RUNNING);
> + remove_wait_queue(wait_queue, &we);
> +}
This looks as if we should at least check for pending signals, otherwise
there is no point in waiting interruptible.
Maybe wait_event_interruptible() is what we really want here...
Regards,
Lino
^ permalink raw reply
* net: Detect drivers that reschedule NAPI and exhaust budget
From: Herbert Xu @ 2014-12-20 0:36 UTC (permalink / raw)
To: David Vrabel
Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet,
David S. Miller
In-Reply-To: <20141220002327.GA31975@gondor.apana.org.au>
On Sat, Dec 20, 2014 at 11:23:27AM +1100, Herbert Xu wrote:
>
> A similar bug exists in virtio_net.
In order to detect other drivers doing this we should add something
like this.
-- >8 --
The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
interrupt masking in NAPI) required drivers to leave poll_list
empty if the entire budget is consumed.
We have already had two broken drivers so let's add a check for
this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/net/core/dev.c b/net/core/dev.c
index f411c28..88f9725 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h)
*/
napi_gro_flush(n, HZ >= 1000);
}
- list_add_tail(&n->poll_list, &repoll);
+ /* Some drivers may have called napi_schedule
+ * prior to exhausting their budget.
+ */
+ if (!WARN_ON_ONCE(!list_empty(&n->poll_list)))
+ list_add_tail(&n->poll_list, &repoll);
}
}
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* RE: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Chickles, Derek @ 2014-12-20 0:47 UTC (permalink / raw)
To: Lino Sanfilippo, Vatsavayi, Raghu, davem@davemloft.net
Cc: netdev@vger.kernel.org, Burla, Satananda, Manlunas, Felix,
Vatsavayi, Raghu, stephen Hemminger
In-Reply-To: <5494ABE3.5070003@gmx.de>
> > static int
> > +oct_cfg_rx_intrcnt(struct lio *lio, struct ethtool_coalesce *intr_coal)
> > +{
>
> > + }
> > +
> > + spin_lock_irqsave(&cn6xxx->lock_for_droq_int_enb_reg, flags);
> > + octeon_write_csr(oct, OCT_SLI_REGNAME(oct, PKT_CNT_INT_ENB),
> intr);
> > + spin_unlock_irqrestore(&cn6xxx->lock_for_droq_int_enb_reg,
> flags);
> > +
> > + return 0;
>
> What is the reason that this is locked? If it really has to be
> synchronized then there should AFAIK at least be an mmiowb() to make
> sure that the write does not leak out of the lock...
>
> Regards,
> Lino
>
Yes, we need this locked because the interrupt handler accesses this as well,
But you are right we need the mmiowb().
Thanks,
Derek
^ permalink raw reply
* Re: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Williams, Kenneth @ 2014-12-20 0:57 UTC (permalink / raw)
To: Roopa Prabhu
Cc: Jiri Pirko, B Viswanath, Samudrala, Sridhar, John Fastabend,
Varlese, Marco, netdev@vger.kernel.org, Thomas Graf,
sfeldma@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <549450D0.8030805@cumulusnetworks.com>
On Fri, Dec 19, 2014 at 08:22:40AM -0800, Roopa Prabhu wrote:
> On 12/19/14, 1:55 AM, Jiri Pirko wrote:
> >Fri, Dec 19, 2014 at 10:35:27AM CET, marichika4@gmail.com wrote:
> >>On 19 December 2014 at 14:53, Jiri Pirko <jiri@resnulli.us> wrote:
> >>>Fri, Dec 19, 2014 at 10:01:46AM CET, marichika4@gmail.com wrote:
> >>>>On 19 December 2014 at 13:57, Jiri Pirko <jiri@resnulli.us> wrote:
> >>>>>Fri, Dec 19, 2014 at 06:14:57AM CET, marichika4@gmail.com wrote:
> >>>>>>On 19 December 2014 at 05:18, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> >>>>>>>On 12/18/14, 3:26 PM, Samudrala, Sridhar wrote:
> >>>><snipped for ease of reading>
> >>>>>>>>
> >>>>>>>>We also need an interface to set per-switch attributes. Can this work?
> >>>>>>>> bridge link set dev sw0 sw_attr bcast_flooding 1 master
> >>>>>>>>where sw0 is a bridge representing the hardware switch.
> >>>>>>>
> >>>>>>>Not today. We discussed this @ LPC, and one way to do this would be to have
> >>>>>>>a device
> >>>>>>>representing the switch asic. This is in the works.
> >>>>>>
> >>>>>>Can I assume that on platforms which house more than one asic (say
> >>>>>>two 24 port asics, interconnected via a 10G link or equivalent, to get
> >>>>>>a 48 port 'switch') , the 'rocker' driver (or similar) should expose
> >>>>>>them as a single set of ports, and not as two 'switch ports' ?
> >>>>>Well that really depends on particular implementation and drivers. If you
> >>>>>have 2 pci-e devices, I think you should expose them as 2 entities. For
> >>>>>sure, you can have the driver to do the masking for you. I don't believe
> >>>>>that is correct though.
> >>>>>
> >>>>In a platform that houses two asic chips, IMO, the user is still
> >>>>expected to manage the router as a single entity. The configuration
> >>>>being applied on both asic devices need to be matching if not
> >>>>identical, and may not be conflicting. The FDB is to be synchronized
> >>>>so that (offloaded) switching can happen across the asics. Some of
> >>>>this stuff is asic specific anyway. Another example is that of the
> >>>>learning. The (hardware) learning can't be enabled on one asic, while
> >>>>being disabled on another one. The general use cases I have seen are
> >>>>all involving managing the 'router' as a single entity. That the
> >>>>'router' is implemented with two asics instead of a single asic (with
> >>>>more ports) is to be treated as an implementation detail. This is the
> >>>>usual router management method that exists today.
> >>>>
> >>>>I hope I make sense.
> >>>>
> >>>>So I am trying to figure out what this single entity that will be used
> >>>>from a user perspective. It can be a bridge, but our bridges are more
> >>>>802.1q bridges. We can use the 'self' mode, but then it means that it
> >>>>should reflect the entire port count, and not just an asic.
> >>>>
> >>>>So I was trying to deduce that in our switchdevice model, the best bet
> >>>>would be to leave the unification to the driver (i.e., to project the
> >>>>multiple physical asics as a single virtual switch device). Thist
> >>>Is it possible to have the asic as just single one? Or is it possible to
> >>>connect asics being multiple chips maybe from multiple vendors together?
> >>I didn't understand the first question. Some times, it is possible to
> >I ment that there is a design with just a single asic of this type,
> >instead of a pair.
> >
> >>have a single asic replace two, but its a cost factor, and others that
> >>are involved.
> >>
> >>AFAIK, the answer to the second question is a No. Two asics from
> >>different vendors may not be connected together. The interconnect
> >>tends to be proprietary.
> >Okay. In that case, it might make sense to mask it on driver level.
> >
> >
> >>>I believe that answer is "yes" in both cases. Making two separate asics
> >>>to appear as one for user is not correct in my opinion. Driver should
> >>>not do such masking. It is unclean, unextendable.
> >>>
> >>I am only looking for a single management entity. I am not thinking it
> >>needs to be at driver level. I am not sure of any other option apart
> >>from creating a 'switchdev' that Roopa was mentioning.
> >
> >Well the thing is there is a common desire to make the offloading as
> >transparent as possible. For example, have 4 ports of same switch and
> >put them into br0. Just like that, without need to do anything else
> >than you would do when bridging ordinary NICs. Introducing some
> >"management entity" would break this approach.
> >
> I don't think having a switchdevice breaks this approach. A software bridge
> is not a 1-1 mapping with the asic in all cases.
> When its a vlan filtering bridge, yes, it is (In which case all switch
> global l2 non-port specific attributes can be applied to the bridge).
>
> The switch asic can do l2 and l3 too. For a bridge, the switch asic is just
> accelerating l2.
> And a switch asic is also capable of l3, acls. A switch device (whether
> accessible to userspace or not)
> may become necessary (as discussed in other threads) where you cannot
> resolve a kernel object to a switch port (Global acl rules, unresolved route
> nexthops etc).
A switch-chip vendor that provides a proprietary mechanism of
bonding two or more switch-chips into a single functional unit, also
typically provides an API that allows operating on this bonded set of
switch chips to be addressed as a single unit. If my understanding is
correct, the question of port uniqueness, etc becomes moot.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: net: Detect drivers that reschedule NAPI and exhaust budget
From: Eric Dumazet @ 2014-12-20 1:34 UTC (permalink / raw)
To: Herbert Xu
Cc: David Vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky,
edumazet, David S. Miller
In-Reply-To: <20141220003636.GA32187@gondor.apana.org.au>
On Sat, 2014-12-20 at 11:36 +1100, Herbert Xu wrote:
> On Sat, Dec 20, 2014 at 11:23:27AM +1100, Herbert Xu wrote:
> >
> > A similar bug exists in virtio_net.
>
> In order to detect other drivers doing this we should add something
> like this.
>
> -- >8 --
> The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
> interrupt masking in NAPI) required drivers to leave poll_list
> empty if the entire budget is consumed.
>
> We have already had two broken drivers so let's add a check for
> this.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f411c28..88f9725 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h)
> */
> napi_gro_flush(n, HZ >= 1000);
> }
> - list_add_tail(&n->poll_list, &repoll);
> + /* Some drivers may have called napi_schedule
> + * prior to exhausting their budget.
> + */
> + if (!WARN_ON_ONCE(!list_empty(&n->poll_list)))
> + list_add_tail(&n->poll_list, &repoll);
> }
> }
>
I do not think stack trace will point to the buggy driver ?
IMO it would be better to print a single line with the netdev name ?
^ permalink raw reply
* [PATCH net-next] hyperv: Fix some variable name typos in send-buffer init/revoke
From: Haiyang Zhang @ 2014-12-20 2:25 UTC (permalink / raw)
To: davem, netdev; +Cc: olaf, jasowang, driverdev-devel, linux-kernel, haiyangz
The changed names are union fields with the same size, so the existing code
still works. But, we now update these variables to the correct names.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
---
drivers/net/hyperv/hyperv_net.h | 1 +
drivers/net/hyperv/netvsc.c | 15 ++++++++-------
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 2f48f79..384ca4f 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -590,6 +590,7 @@ struct nvsp_message {
#define NETVSC_RECEIVE_BUFFER_ID 0xcafe
+#define NETVSC_SEND_BUFFER_ID 0
#define NETVSC_PACKET_SIZE 4096
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index dd867e6..9f49c01 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -161,8 +161,8 @@ static int netvsc_destroy_buf(struct netvsc_device *net_device)
/* Deal with the send buffer we may have setup.
* If we got a send section size, it means we received a
- * SendsendBufferComplete msg (ie sent
- * NvspMessage1TypeSendReceiveBuffer msg) therefore, we need
+ * NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE msg (ie sent
+ * NVSP_MSG1_TYPE_SEND_SEND_BUF msg) therefore, we need
* to send a revoke msg here
*/
if (net_device->send_section_size) {
@@ -172,7 +172,8 @@ static int netvsc_destroy_buf(struct netvsc_device *net_device)
revoke_packet->hdr.msg_type =
NVSP_MSG1_TYPE_REVOKE_SEND_BUF;
- revoke_packet->msg.v1_msg.revoke_recv_buf.id = 0;
+ revoke_packet->msg.v1_msg.revoke_send_buf.id =
+ NETVSC_SEND_BUFFER_ID;
ret = vmbus_sendpacket(net_device->dev->channel,
revoke_packet,
@@ -204,7 +205,7 @@ static int netvsc_destroy_buf(struct netvsc_device *net_device)
net_device->send_buf_gpadl_handle = 0;
}
if (net_device->send_buf) {
- /* Free up the receive buffer */
+ /* Free up the send buffer */
vfree(net_device->send_buf);
net_device->send_buf = NULL;
}
@@ -339,9 +340,9 @@ static int netvsc_init_buf(struct hv_device *device)
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
init_packet->hdr.msg_type = NVSP_MSG1_TYPE_SEND_SEND_BUF;
- init_packet->msg.v1_msg.send_recv_buf.gpadl_handle =
+ init_packet->msg.v1_msg.send_send_buf.gpadl_handle =
net_device->send_buf_gpadl_handle;
- init_packet->msg.v1_msg.send_recv_buf.id = 0;
+ init_packet->msg.v1_msg.send_send_buf.id = NETVSC_SEND_BUFFER_ID;
/* Send the gpadl notification request */
ret = vmbus_sendpacket(device->channel, init_packet,
@@ -364,7 +365,7 @@ static int netvsc_init_buf(struct hv_device *device)
netdev_err(ndev, "Unable to complete send buffer "
"initialization with NetVsp - status %d\n",
init_packet->msg.v1_msg.
- send_recv_buf_complete.status);
+ send_send_buf_complete.status);
ret = -EINVAL;
goto cleanup;
}
--
1.7.1
^ permalink raw reply related
* Re: net: Detect drivers that reschedule NAPI and exhaust budget
From: David Miller @ 2014-12-20 2:40 UTC (permalink / raw)
To: eric.dumazet
Cc: herbert, david.vrabel, netdev, xen-devel, konrad.wilk,
boris.ostrovsky, edumazet
In-Reply-To: <1419039288.11185.4.camel@edumazet-glaptop2.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 19 Dec 2014 17:34:48 -0800
>> @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h)
>> */
>> napi_gro_flush(n, HZ >= 1000);
>> }
>> - list_add_tail(&n->poll_list, &repoll);
>> + /* Some drivers may have called napi_schedule
>> + * prior to exhausting their budget.
>> + */
>> + if (!WARN_ON_ONCE(!list_empty(&n->poll_list)))
>> + list_add_tail(&n->poll_list, &repoll);
>> }
>> }
>>
>
> I do not think stack trace will point to the buggy driver ?
>
> IMO it would be better to print a single line with the netdev name ?
Right, we are already back from the poll routine and will just end
up seeing the call trace leading to the software interrupt.
^ permalink raw reply
* OVS + BPF, make sense?
From: Andy Zhou @ 2014-12-20 2:49 UTC (permalink / raw)
To: dev-yBygre7rU0SM8Zsap4Y0gw@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hi, OVS and netdev developers:
On 12/17/2014, Alexei Starovoiov and a few of the OVS developers
(Joe, Andy, Jesse, Pravin and Justin) got together to discuss possible
ways for OVS to harness the power BPF in recent Linux kernels. During
the meeting, we feel that the content of the discussion may be of
interest to many OVS and Linux kernel developers, it should be a good
idea to post the meeting minutes.
The meeting minutes can be found below. I cross post them to both
ovs-dev and netdev mailing list. Apologises if you receive this email
twice.
We don't have a concrete plan at this point on how BPF can be applied.
However we are interested in exploring further and exchange ideas with
the developer communities. We can
probably meet again early 2015 if there is sufficient interest in this topic.
Regards,
Andy
BPF current status:
===========================
* Linux kernel up-streaming on going, currently focus on tracing.
Other enhancements
planed, for example, JIT opcode obfuscation, as a security enhancements.
* First set of LLVM upstreaming should land in Q1'15. More
enhancements will follow.
* GCC backend is possible, but not planed at this time.
* New features planed: per-cpu data structures, streaming interfaces
reusing trace buffer infrasturucture.
Possible use cases of BPF in OVS Linux kernel datapath
===========================================
1. Using BPF to implement a single action:
It may make sense for OVS to have its own program type. However,
bpf_register_prog_type() API currently is not exported. This
means the program type and related helper function can not be provided
by the OVS kernel module, but has be be up-streamed into the kernel
core. This may affect how OVS kernel module can provide backward
compatibility. Alexei explained the this is mainly driven by the
concerns of the
complexity related to tracking module load/unload while BPF program
are running, and
the concern of possible side-stepping of GPL.
BPF action may need to access kernel data structures, such as skb,
in kernel version agnostic ways. Alexei is aware of this requirement,
and is considering multiple potential solutions A) bpf helper
functions, b) using pseudo skb, c) ask kernel about offsets for each
interested data field... This is work in progress.
2. Using BPF to implement the entire action list:
This is a bigger task than 1, but can bring more benefits of BPF
to OVS. Current ovs action list are sequentially executed. BPF
provides if-then-else and other types of control capabilities to
'upgrade' action list to a true program.
Those BPF programs needs to be generated at run time by OVS users
space. Alexei thinks this may not be hard within the scope of current
OVS actions. Jesse suggested to reference libpcap style of program generation.
3. Using BPF to implement ovs flow extract
Flow extract functions are sweet spots for applying BPF. BPF can
be the backend of current OpenFlow match parser, or even the back end
of a more flexible parser such as P4.
4. Using BPF to implement overall OVS kernel module functionality
Alexei likes this approach the most. The potential benefits are:
a) flexible parser and flow data structure
b) user space and kernel data structures are always in-sync, thus
removing the complexity of version compatibility handling and error
checking.
c) possible higher performance than current kernel module, with
JITed BPF code.
d) The helper functions can be more easily planned out. This can be
important in case dynamic helper function registration is not
possible.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: net: Detect drivers that reschedule NAPI and exhaust budget
From: Herbert Xu @ 2014-12-20 6:55 UTC (permalink / raw)
To: David Miller
Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk,
boris.ostrovsky, edumazet
In-Reply-To: <20141219.214000.819506179607476836.davem@davemloft.net>
On Fri, Dec 19, 2014 at 09:40:00PM -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 19 Dec 2014 17:34:48 -0800
>
> >> @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h)
> >> */
> >> napi_gro_flush(n, HZ >= 1000);
> >> }
> >> - list_add_tail(&n->poll_list, &repoll);
> >> + /* Some drivers may have called napi_schedule
> >> + * prior to exhausting their budget.
> >> + */
> >> + if (!WARN_ON_ONCE(!list_empty(&n->poll_list)))
> >> + list_add_tail(&n->poll_list, &repoll);
> >> }
> >> }
> >>
> >
> > I do not think stack trace will point to the buggy driver ?
> >
> > IMO it would be better to print a single line with the netdev name ?
>
> Right, we are already back from the poll routine and will just end
> up seeing the call trace leading to the software interrupt.
Good point Eric.
-- >8 --
The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
interrupt masking in NAPI) required drivers to leave poll_list
empty if the entire budget is consumed.
We have already had two broken drivers so let's add a check for
this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/net/core/dev.c b/net/core/dev.c
index f411c28..47fdc5c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4620,7 +4620,13 @@ static void net_rx_action(struct softirq_action *h)
*/
napi_gro_flush(n, HZ >= 1000);
}
- list_add_tail(&n->poll_list, &repoll);
+ /* Some drivers may have called napi_schedule
+ * prior to exhausting their budget.
+ */
+ if (unlikely(!list_empty(&n->poll_list)))
+ pr_warn("%s: Budget exhausted after napi rescheduled\n", n->dev ? n->dev->name : "backlog");
+ else
+ list_add_tail(&n->poll_list, &repoll);
}
}
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* [PATCH]e100 in linux-3.18.0: some potential bugs
From: Jia-Ju Bai @ 2014-12-20 7:40 UTC (permalink / raw)
To: todd.fujinaka; +Cc: netdev, Linux-nics, linux.nics, e1000-devel
[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]
I have actually tested e100 driver on the real hardware(Intel 82559 PCI
Ethernet Controller), and find some bugs:
The target file is drivers/net/ethernet/intel/e100.c, which is used to build
e100.ko.
(1) The function pci_pool_create is called by e100_probe when initializing
the ethernet card driver. But when pci_pool_create is failed, which means
that it returns NULL to nic->cbs_pool, the system crash will happen. Because
pci_pool_alloc (in e100_alloc_cbs in e100_up in e100_open) need to use
nic->cbs_pool to allocate the resource, but it is NULL. I suggest that a
check can be added in the code to detect whether pci_pool_create returns
NULL.
(2) In the normal process, netif_napi_add is called in e100_probe, but
netif_napi_del is not called in e100_remove. However, many other ethernet
card drivers call them in pairs, even in the error handling paths, such as
r8169 and igb.
Meanwhile, I also write the patch to fix the bugs. I have run the patch on
the hardware, it can work normally and fix the above bugs.
diff --git a/drivers/net/ethernet/intel/e100.c
b/drivers/net/ethernet/intel/e100.c
index 781065e..2631d3f 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -2969,6 +2969,11 @@ static int e100_probe(struct pci_dev *pdev, const
struct pci_device_id *ent)
nic->params.cbs.max * sizeof(struct cb),
sizeof(u32),
0);
+ if(!(nic->cbs_pool))
+ {
+ err = -ENOMEM;
+ goto err_out_pool;
+ }
netif_info(nic, probe, nic->netdev,
"addr 0x%llx, irq %d, MAC addr %pM\n",
(unsigned long long)pci_resource_start(pdev, use_io ? 1 :
0),
@@ -2976,6 +2981,8 @@ static int e100_probe(struct pci_dev *pdev, const
struct pci_device_id *ent)
return 0;
+err_out_pool:
+ unregister_netdev(netdev);
err_out_free:
e100_free(nic);
err_out_iounmap:
@@ -2985,6 +2992,7 @@ err_out_free_res:
err_out_disable_pdev:
pci_disable_device(pdev);
err_out_free_dev:
+ netif_napi_del(&nic->napi);
free_netdev(netdev);
return err;
}
@@ -2995,6 +3003,7 @@ static void e100_remove(struct pci_dev *pdev)
if (netdev) {
struct nic *nic = netdev_priv(netdev);
+ netif_napi_del(&nic->napi);
unregister_netdev(netdev);
e100_free(nic);
pci_iounmap(pdev, nic->csr);
[-- Attachment #2: patch_e100 --]
[-- Type: application/octet-stream, Size: 1265 bytes --]
diff --git a/drivers/net/ethernet/intel/e100.c b/drivers/net/ethernet/intel/e100.c
index 781065e..2631d3f 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -2969,6 +2969,11 @@ static int e100_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
nic->params.cbs.max * sizeof(struct cb),
sizeof(u32),
0);
+ if(!(nic->cbs_pool))
+ {
+ err = -ENOMEM;
+ goto err_out_pool;
+ }
netif_info(nic, probe, nic->netdev,
"addr 0x%llx, irq %d, MAC addr %pM\n",
(unsigned long long)pci_resource_start(pdev, use_io ? 1 : 0),
@@ -2976,6 +2981,8 @@ static int e100_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
+err_out_pool:
+ unregister_netdev(netdev);
err_out_free:
e100_free(nic);
err_out_iounmap:
@@ -2985,6 +2992,7 @@ err_out_free_res:
err_out_disable_pdev:
pci_disable_device(pdev);
err_out_free_dev:
+ netif_napi_del(&nic->napi);
free_netdev(netdev);
return err;
}
@@ -2995,6 +3003,7 @@ static void e100_remove(struct pci_dev *pdev)
if (netdev) {
struct nic *nic = netdev_priv(netdev);
+ netif_napi_del(&nic->napi);
unregister_netdev(netdev);
e100_free(nic);
pci_iounmap(pdev, nic->csr);
^ permalink raw reply related
* [PATCH] e1000 in linux-3.18.0: a potential bug
From: Jia-Ju Bai @ 2014-12-20 7:50 UTC (permalink / raw)
To: todd.fujinaka, netdev; +Cc: linux.nics, e1000-devel
I have actually tested e1000 driver on the real hardware(Intel 82540EM PCI
Gigabit Ethernet Controller), and find a potential bug:
The target file is drivers/net/ethernet/intel/e1000/e1000_main.c, which is
used to build e1000.ko.
(1) In the normal process, netif_napi_add is called in e1000_probe, but
netif_napi_del is not called in e1000_remove. However, many other ethernet
card drivers call them in pairs, even in the error handling paths, such as
r8169 and igb.
Meanwhile, I also write the patch to fix the bug. I have run the patch on
the hardware, it can work normally and fix the above bug.
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 24f3986..f6def7b 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1004,7 +1004,7 @@ static int e1000_probe(struct pci_dev *pdev, const
struct pci_device_id *ent)
/* make ready for any if (hw->...) below */
err = e1000_init_hw_struct(adapter, hw);
if (err)
- goto err_sw_init;
+ goto err_dma;
/* there is a workaround being applied below that limits
* 64-bit DMA addresses to 64-bit hardware. There are some
@@ -1239,8 +1239,9 @@ err_eeprom:
iounmap(hw->flash_address);
kfree(adapter->tx_ring);
kfree(adapter->rx_ring);
-err_dma:
err_sw_init:
+ netif_napi_del(&adapter->napi);
+err_dma:
err_mdio_ioremap:
iounmap(hw->ce4100_gbe_mdio_base_virt);
iounmap(hw->hw_addr);
@@ -1271,6 +1272,7 @@ static void e1000_remove(struct pci_dev *pdev)
e1000_down_and_stop(adapter);
e1000_release_manageability(adapter);
+ netif_napi_del(&adapter->napi);
unregister_netdev(netdev);
e1000_phy_hw_reset(hw);
Thanks!
^ permalink raw reply related
* [PATCH] e1000e in linux-3.18.0: some potential bugs
From: Jia-Ju Bai @ 2014-12-20 8:02 UTC (permalink / raw)
To: todd.fujinaka, netdev; +Cc: e1000-devel, linux.nics
I have actually tested e1000e driver on the real hardware(Intel 82572EI
PCI-E Gigabit Ethernet Controller), and find some potential bugs:
The target file is drivers/net/ethernet/intel/e1000e/netdev.c, which is used
to build e1000e.ko.
(1) In the normal process, netif_napi_add is called in e1000_probe, but
netif_napi_del is not called in e1000_remove. However, many other ethernet
card drivers call them in pairs, even in the error handling paths, such as
r8169 and igb.
(2) The function vzalloc is called by e1000e_setup_rx_resources (in
e1000_open) when initializing the ethernet card driver. But when vzalloc is
failed, "err" segment in e1000e_setup_rx_resources is executed to return and
then e1000e_free_tx_resources in "err_setup_rx" segment in e1000_open is
executed to halt. However, "writel(0, tx_ring->head)" statement in
e1000_clean_tx_ring in e1000e_free_tx_resources will cause system crash,
because "tx_ring->head" is not assigned the value. In the code,
"tx_ring->head" is initialized in e1000_configure_tx in e1000_configure
after the e1000e_setup_rx_resources.
(3) The same system crashes happens, when kcalloc in
e1000e_setup_rx_resources is failed(returns NULL).
(4) The same system crashes happens, when e1000_alloc_ring_dma in
e1000e_setup_rx_resources is failed(returns error code).
(5) In the normal process of e1000e, pci_enable_pcie_error_reporting and
pci_disable_pcie_error_reporting is called in pairs in e1000_probe and
e1000_remove. However, when pci_enable_pcie_error_reporting has been called
and pci_save_state in e1000_probe is failed, "err_alloc_etherdev" segment in
e1000_probe is executed immediately to exit, but
pci_disable_pcie_error_reporting is not called.
(6) The same situation happens when alloc_etherdev_mqs in e1000_probe is
failed.
(7) The same situation happens when ioremap in e1000_probe is failed.
(8) The same situation happens when e1000_sw_init in e1000_probe is failed.
(9) The same situation happens when register_netdev in e1000_probe is
failed.
(10) When request_irq in e1000_request_irq is failed, pm_qos_add_request in
e1000_open is called, but pm_qos_remove_request is not called.
Meanwhile, I also write the patch to fix the bugs. I have run the patch on
the hardware, it can work normally and fix the above bugs.
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 247335d..02d1e67 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2444,6 +2444,8 @@ static void e1000_clean_tx_ring(struct e1000_ring
*tx_ring)
tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
+ if(!(tx_ring->head))
+ return;
writel(0, tx_ring->head);
if (adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
e1000e_update_tdt_wa(tx_ring, 0);
@@ -4358,11 +4360,13 @@ static int e1000_open(struct net_device *netdev)
netif_carrier_off(netdev);
/* allocate transmit descriptors */
+ adapter->tx_ring->head = NULL;
err = e1000e_setup_tx_resources(adapter->tx_ring);
if (err)
goto err_setup_tx;
/* allocate receive descriptors */
+ adapter->rx_ring->head = NULL;
err = e1000e_setup_rx_resources(adapter->rx_ring);
if (err)
goto err_setup_rx;
@@ -4430,6 +4434,7 @@ static int e1000_open(struct net_device *netdev)
return 0;
err_req_irq:
+ pm_qos_remove_request(&adapter->netdev->pm_qos_req);
e1000e_release_hw_control(adapter);
e1000_power_down_phy(adapter);
e1000e_free_rx_resources(adapter->rx_ring);
@@ -7045,6 +7050,7 @@ err_hw_init:
kfree(adapter->tx_ring);
kfree(adapter->rx_ring);
err_sw_init:
+ netif_napi_del(&adapter->napi);
if (adapter->hw.flash_address)
iounmap(adapter->hw.flash_address);
e1000e_reset_interrupt_capability(adapter);
@@ -7053,6 +7059,7 @@ err_flashmap:
err_ioremap:
free_netdev(netdev);
err_alloc_etherdev:
+ pci_disable_pcie_error_reporting(pdev);
pci_release_selected_regions(pdev,
pci_select_bars(pdev, IORESOURCE_MEM));
err_pci_reg:
@@ -7103,6 +7110,7 @@ static void e1000_remove(struct pci_dev *pdev)
/* Don't lie to e1000_close() down the road. */
if (!down)
clear_bit(__E1000_DOWN, &adapter->state);
+ netif_napi_del(&adapter->napi);
unregister_netdev(netdev);
if (pci_dev_run_wake(pdev))
Thanks!
^ permalink raw reply related
* [PATCH] igb in linux-3.18.0: some potential bugs
From: Jia-Ju Bai @ 2014-12-20 8:11 UTC (permalink / raw)
To: todd.fujinaka, netdev; +Cc: e1000-devel, linux.nics
I have actually tested igb driver on the real hardware(Intel 82575EB PCI-E
Gigabit Ethernet Controller), and find some potential bugs:
The target file is drivers/net/ethernet/intel/igb/igb_main.c
(1) In the normal process of igb, pci_enable_pcie_error_reporting and
pci_disable_pcie_error_reporting is called in pairs in igb_probe and
igb_remove. However, when pci_enable_pcie_error_reporting has been called
and alloc_etherdev_mqs in igb_probe is failed, "err_alloc_etherdev" segment
in igb_probe is executed immediately to exit, but
pci_disable_pcie_error_reporting is not called.
(2) The same situation happens when pci_iomap in igb_probe is failed.
(3) The same situation happens when igb_sw_init in igb_probe is failed.
(4) The same situation happens when register_netdev in igb_probe is failed.
(5) The same situation happens when igb_init_i2c in igb_probe is failed.
(6) The function kcalloc is called by igb_sw_init when initializing the
ethernet card driver, but kfree is not called when register_netdev in
igb_probe is failed, which may cause memory leak.
(7) The same situation happens when igb_init_i2c in igb_probe is failed.
(8) The same situation happens when kzalloc in igb_alloc_q_vector is failed.
(9) The same situation happens when igb_alloc_q_vector in
igb_alloc_q_vectors is failed.
(10) When igb_init_i2c in igb_probe is failed, igb_enable_sriov is called in
igb_probe_vfs, but igb_disable_sriov is not called.
(11) The same situation with [10] happens when register_netdev in igb_probe
is failed.
Meanwhile, I also write the patch to fix the bugs. I have run the patch on
the hardware, it can work normally and fix the above bugs.
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
b/drivers/net/ethernet/intel/igb/igb_main.c
index 487cd9c..cd9364a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -179,6 +179,7 @@ static void igb_check_vf_rate_limit(struct igb_adapter
*);
#ifdef CONFIG_PCI_IOV
static int igb_vf_configure(struct igb_adapter *adapter, int vf);
static int igb_pci_enable_sriov(struct pci_dev *dev, int num_vfs);
+static int igb_disable_sriov(struct pci_dev *pdev);
#endif
#ifdef CONFIG_PM
@@ -2653,17 +2654,22 @@ err_register:
igb_release_hw_control(adapter);
memset(&adapter->i2c_adap, 0, sizeof(adapter->i2c_adap));
err_eeprom:
+#ifdef CONFIG_PCI_IOV
+ igb_disable_sriov(pdev);
+#endif
if (!igb_check_reset_block(hw))
igb_reset_phy(hw);
if (hw->flash_address)
iounmap(hw->flash_address);
err_sw_init:
+ kfree(adapter->shadow_vfta);
igb_clear_interrupt_scheme(adapter);
pci_iounmap(pdev, hw->hw_addr);
err_ioremap:
free_netdev(netdev);
err_alloc_etherdev:
+ pci_disable_pcie_error_reporting(pdev);
pci_release_selected_regions(pdev,
pci_select_bars(pdev, IORESOURCE_MEM));
err_pci_reg:
Thanks!
^ permalink raw reply related
* RECEIVE YOUR ATM CARD BEFORE 23RD OF DECEMBER 2014
From: ACCESS BANK PLC ATM DEPARTMENT @ 2014-12-20 8:26 UTC (permalink / raw)
THIS ACCESS BANK PLC WANT TO INFORM YOU THAT YOUR ATM CARD IS READY, THAT IF YOU NEED IT, YOU MUST PAY THE $98. IF YOU ARE READY, MAKE SURE YOU SEND ME YOUR FULL NAMES AND YOUR DIRECT TELEPHONE NUMBER FOR ME TO CALL YOU SO THAT YOU CAN PAY DIRECTLY TO OUR ACCOUNT OFFICER.
Thanks,
DR. CHRIS MICHAEL
FROM ACCESS BANK PLC
E-MAIL: accessb575@gmail.com
^ permalink raw reply
* Re: [linux-nics] [PATCH]e100 in linux-3.18.0: some potential bugs
From: Jeff Kirsher @ 2014-12-20 10:18 UTC (permalink / raw)
To: Jia-Ju Bai; +Cc: todd.fujinaka, e1000-devel, netdev, linux.nics, Linux-nics
In-Reply-To: <000001d01c28$41c937c0$c55ba740$@163.com>
[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]
On Sat, 2014-12-20 at 15:40 +0800, Jia-Ju Bai wrote:
> I have actually tested e100 driver on the real hardware(Intel 82559
> PCI
> Ethernet Controller), and find some bugs:
> The target file is drivers/net/ethernet/intel/e100.c, which is used to
> build
> e100.ko.
>
> (1) The function pci_pool_create is called by e100_probe when
> initializing
> the ethernet card driver. But when pci_pool_create is failed, which
> means
> that it returns NULL to nic->cbs_pool, the system crash will happen.
> Because
> pci_pool_alloc (in e100_alloc_cbs in e100_up in e100_open) need to use
> nic->cbs_pool to allocate the resource, but it is NULL. I suggest that
> a
> check can be added in the code to detect whether pci_pool_create
> returns
> NULL.
> (2) In the normal process, netif_napi_add is called in e100_probe, but
> netif_napi_del is not called in e100_remove. However, many other
> ethernet
> card drivers call them in pairs, even in the error handling paths,
> such as
> r8169 and igb.
>
> Meanwhile, I also write the patch to fix the bugs. I have run the
> patch on
> the hardware, it can work normally and fix the above bugs.
Did you actually experience an issue? Or is this a theoretical issue
that was never actually seen?
>
> diff --git a/drivers/net/ethernet/intel/e100.c
> b/drivers/net/ethernet/intel/e100.c
> index 781065e..2631d3f 100644
> --- a/drivers/net/ethernet/intel/e100.c
> +++ b/drivers/net/ethernet/intel/e100.c
> @@ -2969,6 +2969,11 @@ static int e100_probe(struct pci_dev *pdev,
> const
> struct pci_device_id *ent)
> nic->params.cbs.max * sizeof(struct cb),
> sizeof(u32),
> 0);
> + if(!(nic->cbs_pool))
> + {
> + err = -ENOMEM;
> + goto err_out_pool;
> + }
> netif_info(nic, probe, nic->netdev,
Minor nit-pick but your open bracket needs to be on the same line as the
if statement AND you need a space between the 'if' and (). So the above
code should look like:
if (!nic->cbs_pool) {
err = -ENOMEM;
goto err_out_pool;
}
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [linux-nics] [PATCH] e1000e in linux-3.18.0: some potential bugs
From: Jeff Kirsher @ 2014-12-20 10:22 UTC (permalink / raw)
To: Jia-Ju Bai; +Cc: todd.fujinaka, netdev, e1000-devel, linux.nics
In-Reply-To: <000f01d01c2b$4af1b3b0$e0d51b10$@163.com>
[-- Attachment #1: Type: text/plain, Size: 3185 bytes --]
On Sat, 2014-12-20 at 16:02 +0800, Jia-Ju Bai wrote:
> I have actually tested e1000e driver on the real hardware(Intel
> 82572EI
> PCI-E Gigabit Ethernet Controller), and find some potential bugs:
> The target file is drivers/net/ethernet/intel/e1000e/netdev.c, which
> is used
> to build e1000e.ko.
>
> (1) In the normal process, netif_napi_add is called in e1000_probe,
> but
> netif_napi_del is not called in e1000_remove. However, many other
> ethernet
> card drivers call them in pairs, even in the error handling paths,
> such as
> r8169 and igb.
>
> (2) The function vzalloc is called by e1000e_setup_rx_resources (in
> e1000_open) when initializing the ethernet card driver. But when
> vzalloc is
> failed, "err" segment in e1000e_setup_rx_resources is executed to
> return and
> then e1000e_free_tx_resources in "err_setup_rx" segment in e1000_open
> is
> executed to halt. However, "writel(0, tx_ring->head)" statement in
> e1000_clean_tx_ring in e1000e_free_tx_resources will cause system
> crash,
> because "tx_ring->head" is not assigned the value. In the code,
> "tx_ring->head" is initialized in e1000_configure_tx in
> e1000_configure
> after the e1000e_setup_rx_resources.
> (3) The same system crashes happens, when kcalloc in
> e1000e_setup_rx_resources is failed(returns NULL).
> (4) The same system crashes happens, when e1000_alloc_ring_dma in
> e1000e_setup_rx_resources is failed(returns error code).
>
> (5) In the normal process of e1000e, pci_enable_pcie_error_reporting
> and
> pci_disable_pcie_error_reporting is called in pairs in e1000_probe and
> e1000_remove. However, when pci_enable_pcie_error_reporting has been
> called
> and pci_save_state in e1000_probe is failed, "err_alloc_etherdev"
> segment in
> e1000_probe is executed immediately to exit, but
> pci_disable_pcie_error_reporting is not called.
> (6) The same situation happens when alloc_etherdev_mqs in e1000_probe
> is
> failed.
> (7) The same situation happens when ioremap in e1000_probe is failed.
> (8) The same situation happens when e1000_sw_init in e1000_probe is
> failed.
> (9) The same situation happens when register_netdev in e1000_probe is
> failed.
>
> (10) When request_irq in e1000_request_irq is failed,
> pm_qos_add_request in
> e1000_open is called, but pm_qos_remove_request is not called.
>
> Meanwhile, I also write the patch to fix the bugs. I have run the
> patch on
> the hardware, it can work normally and fix the above bugs.
Again, is this an issue you saw or a theoretical issue?
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 247335d..02d1e67 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -2444,6 +2444,8 @@ static void e1000_clean_tx_ring(struct
> e1000_ring
> *tx_ring)
> tx_ring->next_to_use = 0;
> tx_ring->next_to_clean = 0;
>
> + if(!(tx_ring->head))
> + return;
Need a space between the 'if' and the (). Please check your patches by
running checkpatch.pl on them before sending them out.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [linux-nics] [PATCH] e1000 in linux-3.18.0: a potential bug
From: Jeff Kirsher @ 2014-12-20 10:34 UTC (permalink / raw)
To: Jia-Ju Bai; +Cc: todd.fujinaka, netdev, e1000-devel, linux.nics
In-Reply-To: <000d01d01c29$a9edb870$fdc92950$@163.com>
[-- Attachment #1: Type: text/plain, Size: 936 bytes --]
On Sat, 2014-12-20 at 15:50 +0800, Jia-Ju Bai wrote:
> I have actually tested e1000 driver on the real hardware(Intel 82540EM
> PCI
> Gigabit Ethernet Controller), and find a potential bug:
> The target file is drivers/net/ethernet/intel/e1000/e1000_main.c,
> which is
> used to build e1000.ko.
>
> (1) In the normal process, netif_napi_add is called in e1000_probe,
> but
> netif_napi_del is not called in e1000_remove. However, many other
> ethernet
> card drivers call them in pairs, even in the error handling paths,
> such as
> r8169 and igb.
>
> Meanwhile, I also write the patch to fix the bug. I have run the patch
> on
> the hardware, it can work normally and fix the above bug.
Was this a bug you actually saw? Or a theoretical bug based on code
review?
I do not mind adding this to my queue so that we can review and test the
patch, although this will cause a fair amount of regression testing.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [linux-nics] [PATCH] igb in linux-3.18.0: some potential bugs
From: Jeff Kirsher @ 2014-12-20 10:35 UTC (permalink / raw)
To: Jia-Ju Bai; +Cc: todd.fujinaka, netdev, e1000-devel, linux.nics
In-Reply-To: <001101d01c2c$8dcc0040$a96400c0$@163.com>
[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]
On Sat, 2014-12-20 at 16:11 +0800, Jia-Ju Bai wrote:
> I have actually tested igb driver on the real hardware(Intel 82575EB
> PCI-E
> Gigabit Ethernet Controller), and find some potential bugs:
> The target file is drivers/net/ethernet/intel/igb/igb_main.c
>
> (1) In the normal process of igb, pci_enable_pcie_error_reporting and
> pci_disable_pcie_error_reporting is called in pairs in igb_probe and
> igb_remove. However, when pci_enable_pcie_error_reporting has been
> called
> and alloc_etherdev_mqs in igb_probe is failed, "err_alloc_etherdev"
> segment
> in igb_probe is executed immediately to exit, but
> pci_disable_pcie_error_reporting is not called.
> (2) The same situation happens when pci_iomap in igb_probe is failed.
> (3) The same situation happens when igb_sw_init in igb_probe is
> failed.
> (4) The same situation happens when register_netdev in igb_probe is
> failed.
> (5) The same situation happens when igb_init_i2c in igb_probe is
> failed.
>
> (6) The function kcalloc is called by igb_sw_init when initializing
> the
> ethernet card driver, but kfree is not called when register_netdev in
> igb_probe is failed, which may cause memory leak.
> (7) The same situation happens when igb_init_i2c in igb_probe is
> failed.
> (8) The same situation happens when kzalloc in igb_alloc_q_vector is
> failed.
> (9) The same situation happens when igb_alloc_q_vector in
> igb_alloc_q_vectors is failed.
>
> (10) When igb_init_i2c in igb_probe is failed, igb_enable_sriov is
> called in
> igb_probe_vfs, but igb_disable_sriov is not called.
> (11) The same situation with [10] happens when register_netdev in
> igb_probe
> is failed.
>
> Meanwhile, I also write the patch to fix the bugs. I have run the
> patch on
> the hardware, it can work normally and fix the above bugs.
Was this a bug you actually saw? Or a theoretical bug based on code
review?
I do not mind adding this to my queue so that we can review and test the
patch, although this will cause a fair amount of regression testing.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox