* Re: [PATCH BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+
From: David Miller @ 2012-12-26 23:13 UTC (permalink / raw)
To: paolo.valente; +Cc: jhs, shemminger, linux-kernel, netdev, rizzo, fchecconi
In-Reply-To: <1355938266-18459-1-git-send-email-paolo.valente@unimore.it>
From: Paolo valente <paolo.valente@unimore.it>
Date: Wed, 19 Dec 2012 18:31:06 +0100
> + /*
> + * The next assignment may let
> + * agg->initial_budget > agg->budgetmax
> + * hold, but this does not cause any harm
> + */
Please format comments in the networking:
/* Like
* this.
*/
and
/*
* Never
* like this.
*/
I know this file is full of exceptions, but that error is to be
corrected rather than expanded.
> + /*
> + * If lmax is lowered, through qfq_change_class, for a class
> + * owning pending packets with larger size than the new value of lmax,
> + * then the following condition may hold.
> + */
Likewise.
And I'm not applying this until someone familiar with this code
does some review of this patch. These are seriously non-trivial
changes.
^ permalink raw reply
* Re: [PATCH V3.8 5/5] rtlwifi: rtl8723ae: Fix warning for unchecked pci_map_single() call
From: Larry Finger @ 2012-12-26 23:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linville, linux-wireless, netdev
In-Reply-To: <1356563631.30414.3.camel@edumazet-glaptop>
On 12/26/2012 05:13 PM, Eric Dumazet wrote:
> On Wed, 2012-12-26 at 16:08 -0600, Larry Finger wrote:
>> Kernel 3.8 implements checking of all DMA mapping calls and issues
>> a WARNING for the first it finds that is not checked.
>>
>> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
>> ---
>> drivers/net/wireless/rtlwifi/rtl8723ae/trx.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c b/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c
>> index 87331d8..7ddd517 100644
>> --- a/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c
>> +++ b/drivers/net/wireless/rtlwifi/rtl8723ae/trx.c
>> @@ -387,6 +387,12 @@ void rtl8723ae_tx_fill_desc(struct ieee80211_hw *hw,
>> PCI_DMA_TODEVICE);
>> u8 bw_40 = 0;
>>
>> + if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
>> + RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
>> + "DMA mapping error");
>> + return;
>> + }
>> + CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_DESC_SIZE);
>> if (mac->opmode == NL80211_IFTYPE_STATION) {
>> bw_40 = mac->bw_40;
>> } else if (mac->opmode == NL80211_IFTYPE_AP ||
>> @@ -542,6 +548,12 @@ void rtl8723ae_tx_fill_cmddesc(struct ieee80211_hw *hw,
>> PCI_DMA_TODEVICE);
>> __le16 fc = hdr->frame_control;
>>
>> + if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
>> + RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
>> + "DMA mapping error");
>> + return;
>> + }
>> + CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_DESC_SIZE);
>> CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_DESC_SIZE);
>>
>> if (firstseg)
>
> This seems wrong...
>
> CLEAR_PCI_TX_DESC_CONTENT(pdesc, TX_DESC_SIZE) doesnt need to be done
> twice, does it ?
No it does not. That is a patching error
Thanks,
Larry
^ permalink raw reply
* Re: [PATCH 1/2] IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
From: David Miller @ 2012-12-26 23:18 UTC (permalink / raw)
To: mike.marciniszyn
Cc: venkat.x.venkatsubra, linux-rdma, roland, rds-devel, netdev
In-Reply-To: <20121221180149.23716.54707.stgit@phlsvslse11.ph.intel.com>
From: Mike Marciniszyn <mike.marciniszyn@intel.com>
Date: Fri, 21 Dec 2012 13:01:49 -0500
> 0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
> added uses of sg_dma_len() and sg_dma_address(). This makes
> RDS DOA with the qib driver.
>
> IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
> respectively since some HCAs overload ib_sg_dma* operations.
>
> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Applied.
^ permalink raw reply
* Re: [PATCH 2/2] IB/rds: suppress incompatible protocol when version is known
From: David Miller @ 2012-12-26 23:18 UTC (permalink / raw)
To: mike.marciniszyn
Cc: venkat.x.venkatsubra, linux-rdma, roland, rds-devel, netdev
In-Reply-To: <20121221180154.23716.44540.stgit@phlsvslse11.ph.intel.com>
From: Mike Marciniszyn <mike.marciniszyn@intel.com>
Date: Fri, 21 Dec 2012 13:01:54 -0500
> Add an else to only print the incompatible protocol message
> when version hasn't been established.
>
> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/2] ipv4/ip_gre: set transport header correctly to gre header
From: David Miller @ 2012-12-26 23:20 UTC (permalink / raw)
To: yamahata; +Cc: netdev
In-Reply-To: <a6eb43766a3959d111cde14b237d435de0b19e15.1356403805.git.yamahata@valinux.co.jp>
From: Isaku Yamahata <yamahata@valinux.co.jp>
Date: Tue, 25 Dec 2012 11:51:03 +0900
> ipgre_tunnel_xmit() incorrectly sets transport header to inner payload
> instead of GRE header. It seems copy-and-pasted from ipip.c.
> So set transport header to gre header.
> (In ipip case the transport header is the inner ip header, so that's
> correct.)
>
> Found by inspection. In practice the incorrect transport header
> doesn't matter because the skb usually is sent to another net_device
> or socket, so the transport header isn't referenced.
>
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Applied.
^ permalink raw reply
* Re: [PATCH 2/2] ipv6/ip6_gre: set transport header correctly
From: David Miller @ 2012-12-26 23:20 UTC (permalink / raw)
To: yamahata; +Cc: netdev
In-Reply-To: <8b3ad21951fd0ab05910ce5d4f4646142b18663a.1356403806.git.yamahata@valinux.co.jp>
From: Isaku Yamahata <yamahata@valinux.co.jp>
Date: Tue, 25 Dec 2012 11:51:04 +0900
> ip6gre_xmit2() incorrectly sets transport header to inner payload
> instead of GRE header. It seems copy-and-pasted from ipip.c.
> Set transport header to gre header.
> (In ipip case the transport header is the inner ip header, so that's
> correct.)
>
> Found by inspection. In practice the incorrect transport header
> doesn't matter because the skb usually is sent to another net_device
> or socket, so the transport header isn't referenced.
>
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Applied.
^ permalink raw reply
* Re: [PATCH 1/2] IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
From: David Miller @ 2012-12-26 23:21 UTC (permalink / raw)
To: bvanassche-HInyCGIudOg
Cc: mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w,
venkat.x.venkatsubra-QHcLZuEGTsvQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A,
rds-devel-N0ozoZBvEnrZJqsBc5GL+g, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <50D997D0.2020004-HInyCGIudOg@public.gmane.org>
From: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Date: Tue, 25 Dec 2012 13:10:56 +0100
> The above patch will slow down RDS for anyone who is not using QLogic
> HCA's, isn' it ? How about replacing the above patch with the
> (untested) patch below ?
I agree with your suggestion but do this kind of simplification in
the next merge window, not now.
That's why I applied Mike's patches as-is.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] bgmac: driver for GBit MAC core on BCMA bus
From: Francois Romieu @ 2012-12-26 23:17 UTC (permalink / raw)
To: Rafał Miłecki; +Cc: netdev, David S. Miller
In-Reply-To: <1356363222-16672-1-git-send-email-zajec5@gmail.com>
Rafał Miłecki <zajec5@gmail.com> :
[...]
> diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
[...]
> +static void bgmac_dma_tx_reset(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
> +{
> + u32 val;
> + int i;
> +
> + if (!ring->mmio_base)
> + return;
static int bgmac_probe(struct bcma_device *core)
[...]
bgmac_chip_reset(bgmac);
err = bgmac_dma_alloc(bgmac);
mmio_base is only set in bgmac_dma_alloc.
-> remove the useless bgmac_chip_reset() in bgmac_probe() and the driver
won't need to test for !ring->mmio_base.
> +
> + /* Susend DMA TX ring first.
/* Suspend DMA TX ring first.
[...]
> +static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
> + struct bgmac_dma_ring *ring,
> + struct sk_buff *skb)
> +{
> + struct device *dma_dev = bgmac->core->dma_dev;
> + struct bgmac_dma_desc *dma_desc;
> + struct bgmac_slot_info *slot;
> + u32 ctl0, ctl1;
> + int free_slots;
> +
> + if (skb->len > BGMAC_DESC_CTL1_LEN) {
> + bgmac_err(bgmac, "Too long skb (%d)\n", skb->len);
> + return NETDEV_TX_BUSY;
> + }
The driver will loop if you don't stop queuing. It won't help.
> +
> + if (ring->start <= ring->end)
> + free_slots = ring->start - ring->end + BGMAC_TX_RING_SLOTS;
> + else
> + free_slots = ring->start - ring->end;
> + if (free_slots <= 1) {
> + bgmac_err(bgmac, "No free slots on ring 0x%X!\n", ring->mmio_base);
> + netif_stop_queue(bgmac->net_dev);
> + return NETDEV_TX_BUSY;
> + }
The driver should stop queueing after its processing if it detects that
it can't handle more requests.
> +
> + slot = &ring->slots[ring->end];
> + slot->skb = skb;
> + slot->dma_addr = dma_map_single(dma_dev, skb->data, skb->len, DMA_TO_DEVICE);
> + if (dma_mapping_error(dma_dev, slot->dma_addr)) {
> + bgmac_err(bgmac, "Mapping error of skb on ring 0x%X\n", ring->mmio_base);
> + return NETDEV_TX_BUSY;
> + }
Why don't you drop the packet and return TX_OK ?
> +
> + ctl0 = BGMAC_DESC_CTL0_IOC | BGMAC_DESC_CTL0_SOF | BGMAC_DESC_CTL0_EOF;
> + if (ring->end == ring->num_slots - 1)
> + ctl0 |= BGMAC_DESC_CTL0_EOT;
> + ctl1 = skb->len & BGMAC_DESC_CTL1_LEN;
> +
> + dma_desc = ring->cpu_base;
> + dma_desc += ring->end;
struct bgmac_dma_desc *dma_desc = ring->cpu_base + ring->end;
> + dma_desc->addr_low = cpu_to_le32(lower_32_bits(slot->dma_addr));
> + dma_desc->addr_high = cpu_to_le32(upper_32_bits(slot->dma_addr));
> + dma_desc->ctl0 = cpu_to_le32(ctl0);
> + dma_desc->ctl1 = cpu_to_le32(ctl1);
> +
> + wmb();
Are we sure that the hardware will never read a descriptor where ctl0 is
set and ctl1 is not (start_xmit, dma starts, competing start_xmit, oops) ?
> +
> + /* Increase ring->end to point empty slot. We tell hardware the first
> + * slot it shold *not* read.
* slot it should *not* read.
[...]
> +static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
> +{
> + struct device *dma_dev = bgmac->core->dma_dev;
> + struct bgmac_dma_desc *dma_desc;
The scope of 'slot' and 'dma_desc' can be reduced.
> + struct bgmac_slot_info *slot;
> + int empty_slot;
> +
> + /* The last slot that hardware didn't consume yet */
> + empty_slot = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_STATUS);
> + empty_slot &= BGMAC_DMA_TX_STATDPTR;
> + empty_slot /= sizeof(struct bgmac_dma_desc);
> +
> + while (ring->start != empty_slot) {
> + /* Set pointers */
> + dma_desc = ring->cpu_base;
> + dma_desc += ring->start;
> + slot = &ring->slots[ring->start];
> +
> + if (slot->skb) {
> + /* Unmap no longer used buffer */
> + dma_unmap_single(dma_dev, slot->dma_addr,
> + slot->skb->len, DMA_TO_DEVICE);
> + slot->dma_addr = 0;
> +
> + /* Free memory! :) */
> + dev_kfree_skb_any(slot->skb);
This is irq enabled NAPI poll context so you can dev_kfree_skb.
> + slot->skb = NULL;
> + } else {
> + bgmac_err(bgmac, "Hardware reported transmission for empty TX ring slot %d! End of ring: %d", ring->start, ring->end);
> + }
> +
> + if (++ring->start >= BGMAC_TX_RING_SLOTS)
> + ring->start = 0;
> + }
> +}
> +
> +static void bgmac_dma_rx_reset(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
> +{
> + if (!ring->mmio_base)
> + return;
See remark about bgmac_dma_tx_reset.
[...]
> +static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
> + int weight)
> +{
> + struct device *dma_dev = bgmac->core->dma_dev;
> + struct bgmac_dma_desc *dma_desc;
> + struct bgmac_slot_info *slot;
> + struct sk_buff *skb, *new_skb;
> + struct bgmac_rx_header *rx;
> + u32 end_slot;
> + u16 len, flags;
> + int handled = 0;
The scope of several variables can be reduced as well.
--
Ueimor
^ permalink raw reply
* Re: [PATCH] bgmac: driver for GBit MAC core on BCMA bus
From: Rafał Miłecki @ 2012-12-27 0:45 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, David S. Miller
In-Reply-To: <20121226231721.GA8243@electric-eye.fr.zoreil.com>
Thanks for your comments, I'll improve the driver/patch.
2012/12/27 Francois Romieu <romieu@fr.zoreil.com>:
>> + struct device *dma_dev = bgmac->core->dma_dev;
>> + struct bgmac_dma_desc *dma_desc;
>> + struct bgmac_slot_info *slot;
>> + struct sk_buff *skb, *new_skb;
>> + struct bgmac_rx_header *rx;
>> + u32 end_slot;
>> + u16 len, flags;
>> + int handled = 0;
>
> The scope of several variables can be reduced as well.
I'm not 100% if I understand... Do you mean I should declare variables
inside "while" loop when possible? I couldn't find suggestions about
anywhere in Documentation, so please give me an extra work of
explanation :)
--
Rafał
^ permalink raw reply
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Jason Wang @ 2012-12-27 3:28 UTC (permalink / raw)
To: gaowanlong; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <50DACF22.5090707@cn.fujitsu.com>
On 12/26/2012 06:19 PM, Wanlong Gao wrote:
> On 12/26/2012 06:06 PM, Jason Wang wrote:
>> On 12/26/2012 03:06 PM, Wanlong Gao wrote:
>>> Add a cpu notifier to virtio-net, so that we can reset the
>>> virtqueue affinity if the cpu hotplug happens. It improve
>>> the performance through enabling or disabling the virtqueue
>>> affinity after doing cpu hotplug.
>> Hi Wanlong:
>>
>> Thanks for looking at this.
>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Jason Wang <jasowang@redhat.com>
>>> Cc: virtualization@lists.linux-foundation.org
>>> Cc: netdev@vger.kernel.org
>>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>>> ---
>>> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 38 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index a6fcf15..9710cf4 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -26,6 +26,7 @@
>>> #include <linux/scatterlist.h>
>>> #include <linux/if_vlan.h>
>>> #include <linux/slab.h>
>>> +#include <linux/cpu.h>
>>>
>>> static int napi_weight = 128;
>>> module_param(napi_weight, int, 0444);
>>> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
>>> module_param(csum, bool, 0444);
>>> module_param(gso, bool, 0444);
>>>
>>> +static bool cpu_hotplug = false;
>>> +
>>> /* FIXME: MTU in config. */
>>> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>>> #define GOOD_COPY_LEN 128
>>> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
>>> vi->affinity_hint_set = false;
>>> }
>>>
>>> +static int virtnet_cpu_callback(struct notifier_block *nfb,
>>> + unsigned long action, void *hcpu)
>>> +{
>>> + switch(action) {
>>> + case CPU_ONLINE:
>>> + case CPU_ONLINE_FROZEN:
>>> + case CPU_DEAD:
>>> + case CPU_DEAD_FROZEN:
>>> + cpu_hotplug = true;
>>> + break;
>>> + default:
>>> + break;
>>> + }
>>> + return NOTIFY_OK;
>>> +}
>>> +
>>> +static struct notifier_block virtnet_cpu_notifier = {
>>> + .notifier_call = virtnet_cpu_callback,
>>> +};
>>> +
>>> static void virtnet_get_ringparam(struct net_device *dev,
>>> struct ethtool_ringparam *ring)
>>> {
>>> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>>> */
>>> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
>>> {
>>> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
>>> + int txq;
>>> +
>>> + if (unlikely(cpu_hotplug == true)) {
>>> + virtnet_set_affinity(netdev_priv(dev), true);
>>> + cpu_hotplug = false;
>>> + }
>>> +
>> Why don't you just do this in callback?
> Callback can just give us a "hcpu", can't get the virtnet_info from callback. Am I missing something?
Well, I think you can just embed the notifier block into virtnet_info,
then use something like container_of in the callback to make the
notifier per device. This also solve the concern of Eric.
>> btw. Does qemu/kvm support cpu-hotplug now?
> From http://www.linux-kvm.org/page/CPUHotPlug, I saw that qemu-kvm can support hotplug
> but failed to merge to qemu.git, right?
Not sure, I just try latest qemu, it even does not have a cpu_set command.
Thanks
>
> Thanks,
> Wanlong Gao
>
^ permalink raw reply
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Jason Wang @ 2012-12-27 3:34 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Wanlong Gao, linux-kernel, Rusty Russell, virtualization, netdev
In-Reply-To: <20121226104638.GA18850@redhat.com>
On 12/26/2012 06:46 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 26, 2012 at 03:06:54PM +0800, Wanlong Gao wrote:
>> Add a cpu notifier to virtio-net, so that we can reset the
>> virtqueue affinity if the cpu hotplug happens. It improve
>> the performance through enabling or disabling the virtqueue
>> affinity after doing cpu hotplug.
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> Thanks for looking into this.
> Some comments:
>
> 1. Looks like the logic in
> virtnet_set_affinity (and in virtnet_select_queue)
> will not work very well when CPU IDs are not
> consequitive. This can happen with hot unplug.
>
> Maybe we should add a VQ allocator, and defining
> a per-cpu variable specifying the VQ instead
> of using CPU ID.
Yes, and generate the affinity hint based on the mapping. Btw, what does
VQ allocator means here?
>
>
> 2. The below code seems racy e.g. when CPU is added
> during device init.
>
> 3. using a global cpu_hotplug seems inelegant.
> In any case we should document what is the
> meaning of this variable.
>
>> ---
>> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 38 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index a6fcf15..9710cf4 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -26,6 +26,7 @@
>> #include <linux/scatterlist.h>
>> #include <linux/if_vlan.h>
>> #include <linux/slab.h>
>> +#include <linux/cpu.h>
>>
>> static int napi_weight = 128;
>> module_param(napi_weight, int, 0444);
>> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
>> module_param(csum, bool, 0444);
>> module_param(gso, bool, 0444);
>>
>> +static bool cpu_hotplug = false;
>> +
>> /* FIXME: MTU in config. */
>> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>> #define GOOD_COPY_LEN 128
>> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
>> vi->affinity_hint_set = false;
>> }
>>
>> +static int virtnet_cpu_callback(struct notifier_block *nfb,
>> + unsigned long action, void *hcpu)
>> +{
>> + switch(action) {
>> + case CPU_ONLINE:
>> + case CPU_ONLINE_FROZEN:
>> + case CPU_DEAD:
>> + case CPU_DEAD_FROZEN:
>> + cpu_hotplug = true;
>> + break;
>> + default:
>> + break;
>> + }
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block virtnet_cpu_notifier = {
>> + .notifier_call = virtnet_cpu_callback,
>> +};
>> +
>> static void virtnet_get_ringparam(struct net_device *dev,
>> struct ethtool_ringparam *ring)
>> {
>> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>> */
>> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
>> {
>> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
>> + int txq;
>> +
>> + if (unlikely(cpu_hotplug == true)) {
>> + virtnet_set_affinity(netdev_priv(dev), true);
>> + cpu_hotplug = false;
>> + }
>> +
>> + txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
>> smp_processor_id();
>>
>> while (unlikely(txq >= dev->real_num_tx_queues))
>> @@ -1248,6 +1278,8 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
>> {
>> struct virtio_device *vdev = vi->vdev;
>>
>> + unregister_hotcpu_notifier(&virtnet_cpu_notifier);
>> +
>> virtnet_set_affinity(vi, false);
>>
>> vdev->config->del_vqs(vdev);
>> @@ -1372,6 +1404,11 @@ static int init_vqs(struct virtnet_info *vi)
>> goto err_free;
>>
>> virtnet_set_affinity(vi, true);
>> +
>> + ret = register_hotcpu_notifier(&virtnet_cpu_notifier);
>> + if (ret)
>> + goto err_free;
>> +
>> return 0;
>>
>> err_free:
>> --
>> 1.8.0
^ permalink raw reply
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Wanlong Gao @ 2012-12-27 3:43 UTC (permalink / raw)
To: Jason Wang
Cc: Michael S. Tsirkin, netdev, linux-kernel, virtualization,
imammedo
In-Reply-To: <50DBC07B.7060303@redhat.com>
On 12/27/2012 11:28 AM, Jason Wang wrote:
> On 12/26/2012 06:19 PM, Wanlong Gao wrote:
>> On 12/26/2012 06:06 PM, Jason Wang wrote:
>>> On 12/26/2012 03:06 PM, Wanlong Gao wrote:
>>>> Add a cpu notifier to virtio-net, so that we can reset the
>>>> virtqueue affinity if the cpu hotplug happens. It improve
>>>> the performance through enabling or disabling the virtqueue
>>>> affinity after doing cpu hotplug.
>>> Hi Wanlong:
>>>
>>> Thanks for looking at this.
>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Cc: netdev@vger.kernel.org
>>>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>>>> ---
>>>> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 38 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index a6fcf15..9710cf4 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -26,6 +26,7 @@
>>>> #include <linux/scatterlist.h>
>>>> #include <linux/if_vlan.h>
>>>> #include <linux/slab.h>
>>>> +#include <linux/cpu.h>
>>>>
>>>> static int napi_weight = 128;
>>>> module_param(napi_weight, int, 0444);
>>>> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
>>>> module_param(csum, bool, 0444);
>>>> module_param(gso, bool, 0444);
>>>>
>>>> +static bool cpu_hotplug = false;
>>>> +
>>>> /* FIXME: MTU in config. */
>>>> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>>>> #define GOOD_COPY_LEN 128
>>>> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
>>>> vi->affinity_hint_set = false;
>>>> }
>>>>
>>>> +static int virtnet_cpu_callback(struct notifier_block *nfb,
>>>> + unsigned long action, void *hcpu)
>>>> +{
>>>> + switch(action) {
>>>> + case CPU_ONLINE:
>>>> + case CPU_ONLINE_FROZEN:
>>>> + case CPU_DEAD:
>>>> + case CPU_DEAD_FROZEN:
>>>> + cpu_hotplug = true;
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> + return NOTIFY_OK;
>>>> +}
>>>> +
>>>> +static struct notifier_block virtnet_cpu_notifier = {
>>>> + .notifier_call = virtnet_cpu_callback,
>>>> +};
>>>> +
>>>> static void virtnet_get_ringparam(struct net_device *dev,
>>>> struct ethtool_ringparam *ring)
>>>> {
>>>> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>>>> */
>>>> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
>>>> {
>>>> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
>>>> + int txq;
>>>> +
>>>> + if (unlikely(cpu_hotplug == true)) {
>>>> + virtnet_set_affinity(netdev_priv(dev), true);
>>>> + cpu_hotplug = false;
>>>> + }
>>>> +
>>> Why don't you just do this in callback?
>> Callback can just give us a "hcpu", can't get the virtnet_info from callback. Am I missing something?
>
> Well, I think you can just embed the notifier block into virtnet_info,
> then use something like container_of in the callback to make the
> notifier per device. This also solve the concern of Eric.
Yeah, thank you very much for your suggestion. I'll try it.
>>> btw. Does qemu/kvm support cpu-hotplug now?
>> From http://www.linux-kvm.org/page/CPUHotPlug, I saw that qemu-kvm can support hotplug
>> but failed to merge to qemu.git, right?
>
> Not sure, I just try latest qemu, it even does not have a cpu_set command.
Adding Igor to CC,
As I know, hotplug support is cleaned from qemu, and Igor want to rework it but not been completed?
I'm not sure about that, Igor, could you send out your tech-preview-patches?
Thanks,
Wanlong Gao
>
> Thanks
>>
>> Thanks,
>> Wanlong Gao
>>
>
>
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2012-12-27 3:44 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) GRE tunnel drivers don't set the transport header properly, they
also blindly deref the inner protocol ipv4 and needs some checks.
Fixes from Isaku Yamahata.
2) Fix sleeps while atomic in netdevice rename code, from Eric
Dumazet.
3) Fix double-spinlock in solos-pci driver, from Dan Carpenter.
4) More ARP bug fixes. Fix lockdep splat in arp_solicit() and
then the bug accidently added by that fix. From Eric Dumazet
and Cong Wang.
5) Remove some __dev* annotations that slipped back in, as well
as all HOTPLUG references. From Greg KH
6) RDS protocol uses wrong interfaces to access scatter-gather
elements, causing a regression. From Mike Marciniszyn.
7) Fix build error in cpts driver, from Richard Cochran.
8) Fix arithmetic in packet scheduler, from Stefan Hasko.
9) Similarly, fix association during calculation of random
backoff in batman-adv. From Akinobu Mita.
Please pull, thanks a lot!
The following changes since commit c4271c6e37c32105492cbbed35f45330cb327b94:
NFS: Kill fscache warnings when mounting without -ofsc (2012-12-21 08:32:09 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
for you to fetch changes up to ae782bb16c35ce27512beeda9be6024c88f85b08:
ipv6/ip6_gre: set transport header correctly (2012-12-26 15:19:56 -0800)
----------------------------------------------------------------
Akinobu Mita (1):
batman-adv: fix random jitter calculation
Cong Wang (1):
arp: fix a regression in arp_solicit()
Dan Carpenter (1):
solos-pci: double lock in geos_gpio_store()
Eric Dumazet (5):
ip_gre: fix possible use after free
net: devnet_rename_seq should be a seqcount
tuntap: dont use a private kmem_cache
ipv4: arp: fix a lockdep splat in arp_solicit()
tcp: should drop incoming frames without ACK flag set
Gao feng (1):
bridge: call br_netpoll_disable in br_add_if
Greg KH (2):
Drivers: network: more __dev* removal
CONFIG_HOTPLUG removal from networking core
Isaku Yamahata (3):
ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
ipv4/ip_gre: set transport header correctly to gre header
ipv6/ip6_gre: set transport header correctly
Li Zefan (1):
netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled
Marciniszyn, Mike (2):
IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
IB/rds: suppress incompatible protocol when version is known
Richard Cochran (2):
cpts: fix build error by removing useless code.
cpts: fix a run time warn_on.
Stefan Hasko (1):
net: sched: integer overflow fix
Yan Burman (1):
net/vxlan: Use the underlying device index when joining/leaving multicast groups
drivers/atm/solos-pci.c | 2 +-
drivers/net/ethernet/marvell/mvmdio.c | 6 +++---
drivers/net/ethernet/marvell/mvneta.c | 19 +++++++++----------
drivers/net/ethernet/ti/cpts.c | 3 +--
drivers/net/ethernet/ti/cpts.h | 1 -
drivers/net/tun.c | 24 +++---------------------
drivers/net/vxlan.c | 6 ++++--
drivers/net/wireless/rtlwifi/rtl8723ae/sw.c | 2 +-
include/linux/netdevice.h | 2 +-
include/net/sock.h | 2 +-
net/batman-adv/bat_iv_ogm.c | 2 +-
net/bridge/br_if.c | 8 +++++---
net/core/dev.c | 18 +++++++++---------
net/core/net-sysfs.c | 4 ----
net/core/sock.c | 4 ++--
net/ipv4/arp.c | 10 ++++------
net/ipv4/ip_gre.c | 13 ++++++++++---
net/ipv4/tcp_input.c | 14 ++++++++++----
net/ipv6/ip6_gre.c | 3 +--
net/rds/ib_cm.c | 11 +++++------
net/rds/ib_recv.c | 9 ++++++---
net/sched/sch_htb.c | 2 +-
net/wireless/reg.c | 7 -------
net/wireless/sysfs.c | 4 ----
24 files changed, 78 insertions(+), 98 deletions(-)
^ permalink raw reply
* [PATCH 1/2] vhost_net: correct error hanlding in vhost_net_set_backend()
From: Jason Wang @ 2012-12-27 6:39 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
Fix the leaking of oldubufs and fd refcnt when fail to initialized used ring.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ebd08b2..629d6b5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -834,8 +834,10 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
vhost_net_enable_vq(n, vq);
r = vhost_init_used(vq);
- if (r)
- goto err_vq;
+ if (r) {
+ sock = NULL;
+ goto err_used;
+ }
n->tx_packets = 0;
n->tx_zcopy_err = 0;
@@ -859,8 +861,14 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
mutex_unlock(&n->dev.mutex);
return 0;
+err_used:
+ if (oldubufs)
+ vhost_ubuf_put_and_wait(oldubufs);
+ if (oldsock)
+ fput(oldsock->file);
err_ubufs:
- fput(sock->file);
+ if (sock)
+ fput(sock->file);
err_vq:
mutex_unlock(&vq->mutex);
err:
--
1.7.1
^ permalink raw reply related
* [PATCH 2/2] vhost: handle polling failure
From: Jason Wang @ 2012-12-27 6:39 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <1356590360-32770-1-git-send-email-jasowang@redhat.com>
Currently, polling error were ignored in vhost. This may lead some issues (e.g
kenrel crash when passing a tap fd to vhost before calling TUNSETIFF). Fix this
by:
- extend the idea of vhost_net_poll_state to all vhost_polls
- change the state only when polling is succeed
- make vhost_poll_start() report errors to the caller, which could be used
caller or userspace.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 75 +++++++++++++++++--------------------------------
drivers/vhost/vhost.c | 16 +++++++++-
drivers/vhost/vhost.h | 11 ++++++-
3 files changed, 50 insertions(+), 52 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 629d6b5..56e7f5a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -64,20 +64,10 @@ enum {
VHOST_NET_VQ_MAX = 2,
};
-enum vhost_net_poll_state {
- VHOST_NET_POLL_DISABLED = 0,
- VHOST_NET_POLL_STARTED = 1,
- VHOST_NET_POLL_STOPPED = 2,
-};
-
struct vhost_net {
struct vhost_dev dev;
struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
struct vhost_poll poll[VHOST_NET_VQ_MAX];
- /* Tells us whether we are polling a socket for TX.
- * We only do this when socket buffer fills up.
- * Protected by tx vq lock. */
- enum vhost_net_poll_state tx_poll_state;
/* Number of TX recently submitted.
* Protected by tx vq lock. */
unsigned tx_packets;
@@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
}
}
-/* Caller must have TX VQ lock */
-static void tx_poll_stop(struct vhost_net *net)
-{
- if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
- return;
- vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
- net->tx_poll_state = VHOST_NET_POLL_STOPPED;
-}
-
-/* Caller must have TX VQ lock */
-static void tx_poll_start(struct vhost_net *net, struct socket *sock)
-{
- if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
- return;
- vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
- net->tx_poll_state = VHOST_NET_POLL_STARTED;
-}
-
/* In case of DMA done not in order in lower device driver for some reason.
* upend_idx is used to track end of used idx, done_idx is used to track head
* of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -252,7 +224,7 @@ static void handle_tx(struct vhost_net *net)
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf) {
mutex_lock(&vq->mutex);
- tx_poll_start(net, sock);
+ vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
mutex_unlock(&vq->mutex);
return;
}
@@ -261,7 +233,7 @@ static void handle_tx(struct vhost_net *net)
vhost_disable_notify(&net->dev, vq);
if (wmem < sock->sk->sk_sndbuf / 2)
- tx_poll_stop(net);
+ vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
hdr_size = vq->vhost_hlen;
zcopy = vq->ubufs;
@@ -283,7 +255,8 @@ static void handle_tx(struct vhost_net *net)
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
- tx_poll_start(net, sock);
+ vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
+ sock->file);
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
}
@@ -294,7 +267,8 @@ static void handle_tx(struct vhost_net *net)
(vq->upend_idx - vq->done_idx) :
(vq->upend_idx + UIO_MAXIOV - vq->done_idx);
if (unlikely(num_pends > VHOST_MAX_PEND)) {
- tx_poll_start(net, sock);
+ vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
+ sock->file);
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
}
@@ -360,7 +334,8 @@ static void handle_tx(struct vhost_net *net)
}
vhost_discard_vq_desc(vq, 1);
if (err == -EAGAIN || err == -ENOBUFS)
- tx_poll_start(net, sock);
+ vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
+ sock->file);
break;
}
if (err != len)
@@ -623,7 +598,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
- n->tx_poll_state = VHOST_NET_POLL_DISABLED;
f->private_data = n;
@@ -635,27 +609,26 @@ static void vhost_net_disable_vq(struct vhost_net *n,
{
if (!vq->private_data)
return;
- if (vq == n->vqs + VHOST_NET_VQ_TX) {
- tx_poll_stop(n);
- n->tx_poll_state = VHOST_NET_POLL_DISABLED;
- } else
+ if (vq == n->vqs + VHOST_NET_VQ_TX)
+ vhost_poll_stop(n->poll + VHOST_NET_VQ_TX);
+ else
vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
}
-static void vhost_net_enable_vq(struct vhost_net *n,
- struct vhost_virtqueue *vq)
+static int vhost_net_enable_vq(struct vhost_net *n,
+ struct vhost_virtqueue *vq)
{
+ int err, index = vq - n->vqs;
struct socket *sock;
sock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
if (!sock)
- return;
- if (vq == n->vqs + VHOST_NET_VQ_TX) {
- n->tx_poll_state = VHOST_NET_POLL_STOPPED;
- tx_poll_start(n, sock);
- } else
- vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
+ return 0;
+
+ n->poll[index].state = VHOST_POLL_STOPPED;
+ err = vhost_poll_start(n->poll + index, sock->file);
+ return err;
}
static struct socket *vhost_net_stop_vq(struct vhost_net *n,
@@ -831,12 +804,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
vq->ubufs = ubufs;
vhost_net_disable_vq(n, vq);
rcu_assign_pointer(vq->private_data, sock);
- vhost_net_enable_vq(n, vq);
+ r = vhost_net_enable_vq(n, vq);
+ if (r) {
+ sock = NULL;
+ goto err_enable;
+ }
r = vhost_init_used(vq);
if (r) {
sock = NULL;
- goto err_used;
+ goto err_enable;
}
n->tx_packets = 0;
@@ -861,7 +838,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
mutex_unlock(&n->dev.mutex);
return 0;
-err_used:
+err_enable:
if (oldubufs)
vhost_ubuf_put_and_wait(oldubufs);
if (oldsock)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 34389f7..1cb2604 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -77,26 +77,36 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
init_poll_funcptr(&poll->table, vhost_poll_func);
poll->mask = mask;
poll->dev = dev;
+ poll->state = VHOST_POLL_DISABLED;
vhost_work_init(&poll->work, fn);
}
/* Start polling a file. We add ourselves to file's wait queue. The caller must
* keep a reference to a file until after vhost_poll_stop is called. */
-void vhost_poll_start(struct vhost_poll *poll, struct file *file)
+int vhost_poll_start(struct vhost_poll *poll, struct file *file)
{
unsigned long mask;
+ if (unlikely(poll->state != VHOST_POLL_STOPPED))
+ return 0;
mask = file->f_op->poll(file, &poll->table);
+ if (mask & POLLERR)
+ return -EINVAL;
if (mask)
vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
+ poll->state = VHOST_POLL_STARTED;
+ return 0;
}
/* Stop polling a file. After this function returns, it becomes safe to drop the
* file reference. You must also flush afterwards. */
void vhost_poll_stop(struct vhost_poll *poll)
{
+ if (likely(poll->state != VHOST_POLL_STARTED))
+ return;
remove_wait_queue(poll->wqh, &poll->wait);
+ poll->state = VHOST_POLL_STOPPED;
}
static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
@@ -791,8 +801,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
if (filep)
fput(filep);
- if (pollstart && vq->handle_kick)
+ if (pollstart && vq->handle_kick) {
+ vq->poll.state = VHOST_POLL_STOPPED;
vhost_poll_start(&vq->poll, vq->kick);
+ }
mutex_unlock(&vq->mutex);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 2639c58..98861d9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -26,6 +26,12 @@ struct vhost_work {
unsigned done_seq;
};
+enum vhost_poll_state {
+ VHOST_POLL_DISABLED = 0,
+ VHOST_POLL_STARTED = 1,
+ VHOST_POLL_STOPPED = 2,
+};
+
/* Poll a file (eventfd or socket) */
/* Note: there's nothing vhost specific about this structure. */
struct vhost_poll {
@@ -35,6 +41,9 @@ struct vhost_poll {
struct vhost_work work;
unsigned long mask;
struct vhost_dev *dev;
+ /* Tells us whether we are polling a file.
+ * Protected by tx vq lock. */
+ enum vhost_poll_state state;
};
void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
@@ -42,7 +51,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
unsigned long mask, struct vhost_dev *dev);
-void vhost_poll_start(struct vhost_poll *poll, struct file *file);
+int vhost_poll_start(struct vhost_poll *poll, struct file *file);
void vhost_poll_stop(struct vhost_poll *poll);
void vhost_poll_flush(struct vhost_poll *poll);
void vhost_poll_queue(struct vhost_poll *poll);
--
1.7.1
^ permalink raw reply related
* [PATCH 1/1 net-next] NET: FEC: dynamtic check DMA desc buff type
From: Frank Li @ 2012-12-27 9:10 UTC (permalink / raw)
To: lznuaa, davem, s.hauer; +Cc: linux-arm-kernel, shawn.guo, netdev, Frank Li
MX6 and mx28 support enhanced DMA descript buff to support 1588
ptp. But MX25, MX3x, MX5x can't support enhanced DMA descript buff.
Check fec type and choose correct DAM descript buff type.
Signed-off-by: Frank Li <Frank.Li@freescale.com>
---
drivers/net/ethernet/freescale/Kconfig | 2 +-
drivers/net/ethernet/freescale/fec.c | 155 ++++++++++++++++++++++++--------
drivers/net/ethernet/freescale/fec.h | 8 ++-
3 files changed, 126 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
index ec490d7..d1edb2e 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -94,7 +94,7 @@ config GIANFAR
config FEC_PTP
bool "PTP Hardware Clock (PHC)"
- depends on FEC && ARCH_MXC && !SOC_IMX25 && !SOC_IMX27 && !SOC_IMX35 && !SOC_IMX5
+ depends on FEC && ARCH_MXC
select PTP_1588_CLOCK
--help---
Say Y here if you want to use PTP Hardware Clock (PHC) in the
diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index 0704bca..21f385d 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -76,6 +76,8 @@
#define FEC_QUIRK_USE_GASKET (1 << 2)
/* Controller has GBIT support */
#define FEC_QUIRK_HAS_GBIT (1 << 3)
+/* Controller has extend desc buffer */
+#define FEC_QUICK_HAS_BUFDESC_EX (1 << 4)
static struct platform_device_id fec_devtype[] = {
{
@@ -93,7 +95,8 @@ static struct platform_device_id fec_devtype[] = {
.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME,
}, {
.name = "imx6q-fec",
- .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT,
+ .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT |
+ FEC_QUICK_HAS_BUFDESC_EX,
}, {
/* sentinel */
}
@@ -117,7 +120,9 @@ static const struct of_device_id fec_dt_ids[] = {
MODULE_DEVICE_TABLE(of, fec_dt_ids);
static unsigned char macaddr[ETH_ALEN];
+static int fec_ptp_enable = 1;
module_param_array(macaddr, byte, NULL, 0);
+module_param(fec_ptp_enable, int, 1);
MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
#if defined(CONFIG_M5272)
@@ -144,6 +149,12 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
#error "FEC: descriptor ring size constants too large"
#endif
+#ifdef CONFIG_FEC_PTP
+#if (((RX_RING_SIZE + TX_RING_SIZE) * 32) > PAGE_SIZE)
+#error "FEC: descriptor ring size constants too large"
+#endif
+#endif
+
/* Interrupt events/masks. */
#define FEC_ENET_HBERR ((uint)0x80000000) /* Heartbeat error */
#define FEC_ENET_BABR ((uint)0x40000000) /* Babbling receiver */
@@ -192,6 +203,36 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
static int mii_cnt;
+#ifdef CONFIG_FEC_PTP
+static struct bufdesc * get_nextdesc(struct bufdesc * bdp, int is_ex)
+{
+ struct bufdesc_ex *ex = (struct bufdesc_ex *)bdp;
+ if (is_ex)
+ return (struct bufdesc*)(ex++);
+ else
+ return bdp++;
+}
+
+static struct bufdesc * get_prevdesc(struct bufdesc * bdp, int is_ex)
+{
+ struct bufdesc_ex *ex = (struct bufdesc_ex *)bdp;
+ if (is_ex)
+ return (struct bufdesc*)(ex--);
+ else
+ return bdp--;
+}
+#else
+static inline struct bufdesc * get_nextdesc(struct bufdesc * bdp, int is_ex)
+{
+ return bdp++;
+}
+static inline struct bufdesc * get_prevdesc(struct bufdesc * bdp, int is_ex)
+{
+ return bdp--;
+}
+
+#endif
+
static void *swap_buffer(void *bufaddr, int len)
{
int i;
@@ -248,7 +289,15 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
*/
if (((unsigned long) bufaddr) & FEC_ALIGNMENT) {
unsigned int index;
+#ifdef CONFIG_FEC_PTP
+ if (fep->bufdesc_ex)
+ index = (struct bufdesc_ex*)bdp -
+ (struct bufdesc_ex*)fep->tx_bd_base;
+ else
+ index = bdp - fep->tx_bd_base;
+#else
index = bdp - fep->tx_bd_base;
+#endif
memcpy(fep->tx_bounce[index], skb->data, skb->len);
bufaddr = fep->tx_bounce[index];
}
@@ -281,14 +330,18 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
bdp->cbd_sc = status;
#ifdef CONFIG_FEC_PTP
- bdp->cbd_bdu = 0;
- if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP &&
+ if (fep->bufdesc_ex)
+ {
+ struct bufdesc_ex * ebdp = (struct bufdesc_ex *)bdp;
+ ebdp->cbd_bdu = 0;
+ if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP &&
fep->hwts_tx_en)) {
- bdp->cbd_esc = (BD_ENET_TX_TS | BD_ENET_TX_INT);
+ ebdp->cbd_esc = (BD_ENET_TX_TS | BD_ENET_TX_INT);
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
- } else {
+ } else {
- bdp->cbd_esc = BD_ENET_TX_INT;
+ ebdp->cbd_esc = BD_ENET_TX_INT;
+ }
}
#endif
/* Trigger transmission start */
@@ -298,7 +351,7 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
if (status & BD_ENET_TX_WRAP)
bdp = fep->tx_bd_base;
else
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
if (bdp == fep->dirty_tx) {
fep->tx_full = 1;
@@ -359,8 +412,12 @@ fec_restart(struct net_device *ndev, int duplex)
/* Set receive and transmit descriptor base. */
writel(fep->bd_dma, fep->hwp + FEC_R_DES_START);
- writel((unsigned long)fep->bd_dma + sizeof(struct bufdesc) * RX_RING_SIZE,
- fep->hwp + FEC_X_DES_START);
+ if (fep->bufdesc_ex)
+ writel((unsigned long)fep->bd_dma + sizeof(struct bufdesc_ex)
+ * RX_RING_SIZE, fep->hwp + FEC_X_DES_START);
+ else
+ writel((unsigned long)fep->bd_dma + sizeof(struct bufdesc)
+ * RX_RING_SIZE, fep->hwp + FEC_X_DES_START);
fep->dirty_tx = fep->cur_tx = fep->tx_bd_base;
fep->cur_rx = fep->rx_bd_base;
@@ -449,7 +506,8 @@ fec_restart(struct net_device *ndev, int duplex)
}
#ifdef CONFIG_FEC_PTP
- ecntl |= (1 << 4);
+ if (fep->bufdesc_ex)
+ ecntl |= (1 << 4);
#endif
/* And last, enable the transmit and receive processing */
@@ -457,7 +515,8 @@ fec_restart(struct net_device *ndev, int duplex)
writel(0, fep->hwp + FEC_R_DES_ACTIVE);
#ifdef CONFIG_FEC_PTP
- fec_ptp_start_cyclecounter(ndev);
+ if (fep->bufdesc_ex)
+ fec_ptp_start_cyclecounter(ndev);
#endif
/* Enable interrupts we wish to service */
writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
@@ -545,14 +604,16 @@ fec_enet_tx(struct net_device *ndev)
}
#ifdef CONFIG_FEC_PTP
- if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)) {
+ if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS) &&
+ fep->bufdesc_ex) {
struct skb_shared_hwtstamps shhwtstamps;
unsigned long flags;
+ struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp;
memset(&shhwtstamps, 0, sizeof(shhwtstamps));
spin_lock_irqsave(&fep->tmreg_lock, flags);
shhwtstamps.hwtstamp = ns_to_ktime(
- timecounter_cyc2time(&fep->tc, bdp->ts));
+ timecounter_cyc2time(&fep->tc, ebdp->ts));
spin_unlock_irqrestore(&fep->tmreg_lock, flags);
skb_tstamp_tx(skb, &shhwtstamps);
}
@@ -575,7 +636,7 @@ fec_enet_tx(struct net_device *ndev)
if (status & BD_ENET_TX_WRAP)
bdp = fep->tx_bd_base;
else
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
/* Since we have freed up a buffer, the ring is no longer full
*/
@@ -685,16 +746,18 @@ fec_enet_rx(struct net_device *ndev)
skb->protocol = eth_type_trans(skb, ndev);
#ifdef CONFIG_FEC_PTP
/* Get receive timestamp from the skb */
- if (fep->hwts_rx_en) {
+ if (fep->hwts_rx_en && fep->bufdesc_ex) {
struct skb_shared_hwtstamps *shhwtstamps =
skb_hwtstamps(skb);
unsigned long flags;
+ struct bufdesc_ex *ebdp =
+ (struct bufdesc_ex *)bdp;
memset(shhwtstamps, 0, sizeof(*shhwtstamps));
spin_lock_irqsave(&fep->tmreg_lock, flags);
shhwtstamps->hwtstamp = ns_to_ktime(
- timecounter_cyc2time(&fep->tc, bdp->ts));
+ timecounter_cyc2time(&fep->tc, ebdp->ts));
spin_unlock_irqrestore(&fep->tmreg_lock, flags);
}
#endif
@@ -713,16 +776,20 @@ rx_processing_done:
bdp->cbd_sc = status;
#ifdef CONFIG_FEC_PTP
- bdp->cbd_esc = BD_ENET_RX_INT;
- bdp->cbd_prot = 0;
- bdp->cbd_bdu = 0;
+ if (fep->bufdesc_ex) {
+ struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp;
+
+ ebdp->cbd_esc = BD_ENET_RX_INT;
+ ebdp->cbd_prot = 0;
+ ebdp->cbd_bdu = 0;
+ }
#endif
/* Update BD pointer to next entry */
if (status & BD_ENET_RX_WRAP)
bdp = fep->rx_bd_base;
else
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
/* Doing this here will keep the FEC running while we process
* incoming frames. On a heavily loaded network, we should be
* able to keep up at the expense of system resources.
@@ -1158,7 +1225,7 @@ static int fec_enet_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd)
return -ENODEV;
#ifdef CONFIG_FEC_PTP
- if (cmd == SIOCSHWTSTAMP)
+ if (cmd == SIOCSHWTSTAMP && fep->bufdesc_ex)
return fec_ptp_ioctl(ndev, rq, cmd);
#endif
return phy_mii_ioctl(phydev, rq, cmd);
@@ -1180,7 +1247,7 @@ static void fec_enet_free_buffers(struct net_device *ndev)
FEC_ENET_RX_FRSIZE, DMA_FROM_DEVICE);
if (skb)
dev_kfree_skb(skb);
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
}
bdp = fep->tx_bd_base;
@@ -1208,13 +1275,16 @@ static int fec_enet_alloc_buffers(struct net_device *ndev)
FEC_ENET_RX_FRSIZE, DMA_FROM_DEVICE);
bdp->cbd_sc = BD_ENET_RX_EMPTY;
#ifdef CONFIG_FEC_PTP
- bdp->cbd_esc = BD_ENET_RX_INT;
+ if (fep->bufdesc_ex) {
+ struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp;
+ ebdp->cbd_esc = BD_ENET_RX_INT;
+ }
#endif
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
}
/* Set the last buffer to wrap. */
- bdp--;
+ bdp = get_prevdesc(bdp, fep->bufdesc_ex);
bdp->cbd_sc |= BD_SC_WRAP;
bdp = fep->tx_bd_base;
@@ -1225,13 +1295,16 @@ static int fec_enet_alloc_buffers(struct net_device *ndev)
bdp->cbd_bufaddr = 0;
#ifdef CONFIG_FEC_PTP
- bdp->cbd_esc = BD_ENET_RX_INT;
+ if (fep->bufdesc_ex) {
+ struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp;
+ ebdp->cbd_esc = BD_ENET_RX_INT;
+ }
#endif
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
}
/* Set the last buffer to wrap. */
- bdp--;
+ bdp = get_prevdesc(bdp, fep->bufdesc_ex);
bdp->cbd_sc |= BD_SC_WRAP;
return 0;
@@ -1444,7 +1517,11 @@ static int fec_enet_init(struct net_device *ndev)
/* Set receive and transmit descriptor base. */
fep->rx_bd_base = cbd_base;
- fep->tx_bd_base = cbd_base + RX_RING_SIZE;
+ if (fep->bufdesc_ex)
+ fep->tx_bd_base = (struct bufdesc*)
+ (((struct bufdesc_ex*)cbd_base + RX_RING_SIZE));
+ else
+ fep->tx_bd_base = cbd_base + RX_RING_SIZE;
/* The FEC Ethernet specific entries in the device structure */
ndev->watchdog_timeo = TX_TIMEOUT;
@@ -1457,11 +1534,11 @@ static int fec_enet_init(struct net_device *ndev)
/* Initialize the BD for every fragment in the page. */
bdp->cbd_sc = 0;
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
}
/* Set the last buffer to wrap */
- bdp--;
+ bdp = get_prevdesc(bdp, fep->bufdesc_ex);
bdp->cbd_sc |= BD_SC_WRAP;
/* ...and the same for transmit */
@@ -1471,11 +1548,11 @@ static int fec_enet_init(struct net_device *ndev)
/* Initialize the BD for every fragment in the page. */
bdp->cbd_sc = 0;
bdp->cbd_bufaddr = 0;
- bdp++;
+ bdp = get_nextdesc(bdp, fep->bufdesc_ex);
}
/* Set the last buffer to wrap */
- bdp--;
+ bdp = get_prevdesc(bdp, fep->bufdesc_ex);
bdp->cbd_sc |= BD_SC_WRAP;
fec_restart(ndev, 0);
@@ -1574,6 +1651,8 @@ fec_probe(struct platform_device *pdev)
fep->pdev = pdev;
fep->dev_id = dev_id++;
+ fep->bufdesc_ex = 0;
+
if (!fep->hwp) {
ret = -ENOMEM;
goto failed_ioremap;
@@ -1630,16 +1709,19 @@ fec_probe(struct platform_device *pdev)
#ifdef CONFIG_FEC_PTP
fep->clk_ptp = devm_clk_get(&pdev->dev, "ptp");
+ fep->bufdesc_ex = fec_ptp_enable ?
+ pdev->id_entry->driver_data & FEC_QUICK_HAS_BUFDESC_EX : 0;
if (IS_ERR(fep->clk_ptp)) {
ret = PTR_ERR(fep->clk_ptp);
- goto failed_clk;
+ fep->bufdesc_ex = 0;
}
#endif
clk_prepare_enable(fep->clk_ahb);
clk_prepare_enable(fep->clk_ipg);
#ifdef CONFIG_FEC_PTP
- clk_prepare_enable(fep->clk_ptp);
+ if (!IS_ERR(fep->clk_ptp))
+ clk_prepare_enable(fep->clk_ptp);
#endif
reg_phy = devm_regulator_get(&pdev->dev, "phy");
if (!IS_ERR(reg_phy)) {
@@ -1669,7 +1751,8 @@ fec_probe(struct platform_device *pdev)
goto failed_register;
#ifdef CONFIG_FEC_PTP
- fec_ptp_init(ndev, pdev);
+ if (fep->bufdesc_ex)
+ fec_ptp_init(ndev, pdev);
#endif
return 0;
diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index c5a3bc1..53f19d3 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -94,14 +94,17 @@ struct bufdesc {
unsigned short cbd_datlen; /* Data length */
unsigned short cbd_sc; /* Control and status info */
unsigned long cbd_bufaddr; /* Buffer address */
-#ifdef CONFIG_FEC_PTP
+};
+
+struct bufdesc_ex {
+ struct bufdesc desc;
unsigned long cbd_esc;
unsigned long cbd_prot;
unsigned long cbd_bdu;
unsigned long ts;
unsigned short res0[4];
-#endif
};
+
#else
struct bufdesc {
unsigned short cbd_sc; /* Control and status info */
@@ -243,6 +246,7 @@ struct fec_enet_private {
int full_duplex;
struct completion mdio_done;
int irq[FEC_IRQ_NUM];
+ int bufdesc_ex;
#ifdef CONFIG_FEC_PTP
struct ptp_clock *ptp_clock;
--
1.7.1
^ permalink raw reply related
* Re: [PATCH 2/2] vhost: handle polling failure
From: Wanlong Gao @ 2012-12-27 10:01 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1356590360-32770-2-git-send-email-jasowang@redhat.com>
On 12/27/2012 02:39 PM, Jason Wang wrote:
> Currently, polling error were ignored in vhost. This may lead some issues (e.g
> kenrel crash when passing a tap fd to vhost before calling TUNSETIFF). Fix this
> by:
Can this kernel crash be reproduced by hand?
Thanks,
Wanlong Gao
>
> - extend the idea of vhost_net_poll_state to all vhost_polls
> - change the state only when polling is succeed
> - make vhost_poll_start() report errors to the caller, which could be used
> caller or userspace.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 75 +++++++++++++++++--------------------------------
> drivers/vhost/vhost.c | 16 +++++++++-
> drivers/vhost/vhost.h | 11 ++++++-
> 3 files changed, 50 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 629d6b5..56e7f5a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -64,20 +64,10 @@ enum {
> VHOST_NET_VQ_MAX = 2,
> };
>
> -enum vhost_net_poll_state {
> - VHOST_NET_POLL_DISABLED = 0,
> - VHOST_NET_POLL_STARTED = 1,
> - VHOST_NET_POLL_STOPPED = 2,
> -};
> -
> struct vhost_net {
> struct vhost_dev dev;
> struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
> struct vhost_poll poll[VHOST_NET_VQ_MAX];
> - /* Tells us whether we are polling a socket for TX.
> - * We only do this when socket buffer fills up.
> - * Protected by tx vq lock. */
> - enum vhost_net_poll_state tx_poll_state;
> /* Number of TX recently submitted.
> * Protected by tx vq lock. */
> unsigned tx_packets;
> @@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
> }
> }
>
> -/* Caller must have TX VQ lock */
> -static void tx_poll_stop(struct vhost_net *net)
> -{
> - if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
> - return;
> - vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
> - net->tx_poll_state = VHOST_NET_POLL_STOPPED;
> -}
> -
> -/* Caller must have TX VQ lock */
> -static void tx_poll_start(struct vhost_net *net, struct socket *sock)
> -{
> - if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
> - return;
> - vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> - net->tx_poll_state = VHOST_NET_POLL_STARTED;
> -}
> -
> /* In case of DMA done not in order in lower device driver for some reason.
> * upend_idx is used to track end of used idx, done_idx is used to track head
> * of used idx. Once lower device DMA done contiguously, we will signal KVM
> @@ -252,7 +224,7 @@ static void handle_tx(struct vhost_net *net)
> wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> if (wmem >= sock->sk->sk_sndbuf) {
> mutex_lock(&vq->mutex);
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> mutex_unlock(&vq->mutex);
> return;
> }
> @@ -261,7 +233,7 @@ static void handle_tx(struct vhost_net *net)
> vhost_disable_notify(&net->dev, vq);
>
> if (wmem < sock->sk->sk_sndbuf / 2)
> - tx_poll_stop(net);
> + vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
> hdr_size = vq->vhost_hlen;
> zcopy = vq->ubufs;
>
> @@ -283,7 +255,8 @@ static void handle_tx(struct vhost_net *net)
>
> wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> break;
> }
> @@ -294,7 +267,8 @@ static void handle_tx(struct vhost_net *net)
> (vq->upend_idx - vq->done_idx) :
> (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
> if (unlikely(num_pends > VHOST_MAX_PEND)) {
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> break;
> }
> @@ -360,7 +334,8 @@ static void handle_tx(struct vhost_net *net)
> }
> vhost_discard_vq_desc(vq, 1);
> if (err == -EAGAIN || err == -ENOBUFS)
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> break;
> }
> if (err != len)
> @@ -623,7 +598,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>
> vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
> vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
> - n->tx_poll_state = VHOST_NET_POLL_DISABLED;
>
> f->private_data = n;
>
> @@ -635,27 +609,26 @@ static void vhost_net_disable_vq(struct vhost_net *n,
> {
> if (!vq->private_data)
> return;
> - if (vq == n->vqs + VHOST_NET_VQ_TX) {
> - tx_poll_stop(n);
> - n->tx_poll_state = VHOST_NET_POLL_DISABLED;
> - } else
> + if (vq == n->vqs + VHOST_NET_VQ_TX)
> + vhost_poll_stop(n->poll + VHOST_NET_VQ_TX);
> + else
> vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
> }
>
> -static void vhost_net_enable_vq(struct vhost_net *n,
> - struct vhost_virtqueue *vq)
> +static int vhost_net_enable_vq(struct vhost_net *n,
> + struct vhost_virtqueue *vq)
> {
> + int err, index = vq - n->vqs;
> struct socket *sock;
>
> sock = rcu_dereference_protected(vq->private_data,
> lockdep_is_held(&vq->mutex));
> if (!sock)
> - return;
> - if (vq == n->vqs + VHOST_NET_VQ_TX) {
> - n->tx_poll_state = VHOST_NET_POLL_STOPPED;
> - tx_poll_start(n, sock);
> - } else
> - vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
> + return 0;
> +
> + n->poll[index].state = VHOST_POLL_STOPPED;
> + err = vhost_poll_start(n->poll + index, sock->file);
> + return err;
> }
>
> static struct socket *vhost_net_stop_vq(struct vhost_net *n,
> @@ -831,12 +804,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> vq->ubufs = ubufs;
> vhost_net_disable_vq(n, vq);
> rcu_assign_pointer(vq->private_data, sock);
> - vhost_net_enable_vq(n, vq);
> + r = vhost_net_enable_vq(n, vq);
> + if (r) {
> + sock = NULL;
> + goto err_enable;
> + }
>
> r = vhost_init_used(vq);
> if (r) {
> sock = NULL;
> - goto err_used;
> + goto err_enable;
> }
>
> n->tx_packets = 0;
> @@ -861,7 +838,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> mutex_unlock(&n->dev.mutex);
> return 0;
>
> -err_used:
> +err_enable:
> if (oldubufs)
> vhost_ubuf_put_and_wait(oldubufs);
> if (oldsock)
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 34389f7..1cb2604 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -77,26 +77,36 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> init_poll_funcptr(&poll->table, vhost_poll_func);
> poll->mask = mask;
> poll->dev = dev;
> + poll->state = VHOST_POLL_DISABLED;
>
> vhost_work_init(&poll->work, fn);
> }
>
> /* Start polling a file. We add ourselves to file's wait queue. The caller must
> * keep a reference to a file until after vhost_poll_stop is called. */
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file)
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file)
> {
> unsigned long mask;
> + if (unlikely(poll->state != VHOST_POLL_STOPPED))
> + return 0;
>
> mask = file->f_op->poll(file, &poll->table);
> + if (mask & POLLERR)
> + return -EINVAL;
> if (mask)
> vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
> + poll->state = VHOST_POLL_STARTED;
> + return 0;
> }
>
> /* Stop polling a file. After this function returns, it becomes safe to drop the
> * file reference. You must also flush afterwards. */
> void vhost_poll_stop(struct vhost_poll *poll)
> {
> + if (likely(poll->state != VHOST_POLL_STARTED))
> + return;
> remove_wait_queue(poll->wqh, &poll->wait);
> + poll->state = VHOST_POLL_STOPPED;
> }
>
> static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
> @@ -791,8 +801,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
> if (filep)
> fput(filep);
>
> - if (pollstart && vq->handle_kick)
> + if (pollstart && vq->handle_kick) {
> + vq->poll.state = VHOST_POLL_STOPPED;
> vhost_poll_start(&vq->poll, vq->kick);
> + }
>
> mutex_unlock(&vq->mutex);
>
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 2639c58..98861d9 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -26,6 +26,12 @@ struct vhost_work {
> unsigned done_seq;
> };
>
> +enum vhost_poll_state {
> + VHOST_POLL_DISABLED = 0,
> + VHOST_POLL_STARTED = 1,
> + VHOST_POLL_STOPPED = 2,
> +};
> +
> /* Poll a file (eventfd or socket) */
> /* Note: there's nothing vhost specific about this structure. */
> struct vhost_poll {
> @@ -35,6 +41,9 @@ struct vhost_poll {
> struct vhost_work work;
> unsigned long mask;
> struct vhost_dev *dev;
> + /* Tells us whether we are polling a file.
> + * Protected by tx vq lock. */
> + enum vhost_poll_state state;
> };
>
> void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
> @@ -42,7 +51,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
>
> void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> unsigned long mask, struct vhost_dev *dev);
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file);
> void vhost_poll_stop(struct vhost_poll *poll);
> void vhost_poll_flush(struct vhost_poll *poll);
> void vhost_poll_queue(struct vhost_poll *poll);
>
^ permalink raw reply
* Re: [PATCH 1/1 net-next] NET: FEC: dynamtic check DMA desc buff type
From: Sascha Hauer @ 2012-12-27 10:35 UTC (permalink / raw)
To: Frank Li; +Cc: lznuaa, davem, linux-arm-kernel, shawn.guo, netdev
In-Reply-To: <1356599447-5698-1-git-send-email-Frank.Li@freescale.com>
On Thu, Dec 27, 2012 at 05:10:47PM +0800, Frank Li wrote:
> MX6 and mx28 support enhanced DMA descript buff to support 1588
> ptp. But MX25, MX3x, MX5x can't support enhanced DMA descript buff.
> Check fec type and choose correct DAM descript buff type.
>
> Signed-off-by: Frank Li <Frank.Li@freescale.com>
Please provide a series that gets rid of CONFIG_FEC_PTP. This patch here
might be a first step to it, but until it's complete and we can have a
look at the end result: No.
> ---
> drivers/net/ethernet/freescale/Kconfig | 2 +-
> drivers/net/ethernet/freescale/fec.c | 155 ++++++++++++++++++++++++--------
> drivers/net/ethernet/freescale/fec.h | 8 ++-
> 3 files changed, 126 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
> index ec490d7..d1edb2e 100644
> --- a/drivers/net/ethernet/freescale/Kconfig
> +++ b/drivers/net/ethernet/freescale/Kconfig
> @@ -94,7 +94,7 @@ config GIANFAR
>
> config FEC_PTP
> bool "PTP Hardware Clock (PHC)"
> - depends on FEC && ARCH_MXC && !SOC_IMX25 && !SOC_IMX27 && !SOC_IMX35 && !SOC_IMX5
> + depends on FEC && ARCH_MXC
> select PTP_1588_CLOCK
> --help---
> Say Y here if you want to use PTP Hardware Clock (PHC) in the
> diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
> index 0704bca..21f385d 100644
> --- a/drivers/net/ethernet/freescale/fec.c
> +++ b/drivers/net/ethernet/freescale/fec.c
> @@ -76,6 +76,8 @@
> #define FEC_QUIRK_USE_GASKET (1 << 2)
> /* Controller has GBIT support */
> #define FEC_QUIRK_HAS_GBIT (1 << 3)
> +/* Controller has extend desc buffer */
> +#define FEC_QUICK_HAS_BUFDESC_EX (1 << 4)
It's 'quirk', not 'quick'
>
> static struct platform_device_id fec_devtype[] = {
> {
> @@ -93,7 +95,8 @@ static struct platform_device_id fec_devtype[] = {
> .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME,
> }, {
> .name = "imx6q-fec",
> - .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT,
> + .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT |
> + FEC_QUICK_HAS_BUFDESC_EX,
> }, {
> /* sentinel */
> }
> @@ -117,7 +120,9 @@ static const struct of_device_id fec_dt_ids[] = {
> MODULE_DEVICE_TABLE(of, fec_dt_ids);
>
> static unsigned char macaddr[ETH_ALEN];
> +static int fec_ptp_enable = 1;
> module_param_array(macaddr, byte, NULL, 0);
> +module_param(fec_ptp_enable, int, 1);
> MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
The 'fec_ptp_enable' shouldn't be weaved into the mac address. Use a
separate block of code for this.
Also, this shouldn't be here at all. If you want to make this
configurable via kernel commandline, then this should be a separate
patch so that it can be discussed separately. Also you should provide a
description *why* you think this is necessary.
>
> #if defined(CONFIG_M5272)
> @@ -144,6 +149,12 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
> #error "FEC: descriptor ring size constants too large"
> #endif
>
> +#ifdef CONFIG_FEC_PTP
> +#if (((RX_RING_SIZE + TX_RING_SIZE) * 32) > PAGE_SIZE)
> +#error "FEC: descriptor ring size constants too large"
> +#endif
> +#endif
> +
> /* Interrupt events/masks. */
> #define FEC_ENET_HBERR ((uint)0x80000000) /* Heartbeat error */
> #define FEC_ENET_BABR ((uint)0x40000000) /* Babbling receiver */
> @@ -192,6 +203,36 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
>
> static int mii_cnt;
>
> +#ifdef CONFIG_FEC_PTP
> +static struct bufdesc * get_nextdesc(struct bufdesc * bdp, int is_ex)
driver specific functions should always have a driver specific prefix.
This patch has coding style issues. Please run through checkpatch.
Sascha
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Michael S. Tsirkin @ 2012-12-27 11:51 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <50DBC1B8.9050406@redhat.com>
On Thu, Dec 27, 2012 at 11:34:16AM +0800, Jason Wang wrote:
> On 12/26/2012 06:46 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 26, 2012 at 03:06:54PM +0800, Wanlong Gao wrote:
> >> Add a cpu notifier to virtio-net, so that we can reset the
> >> virtqueue affinity if the cpu hotplug happens. It improve
> >> the performance through enabling or disabling the virtqueue
> >> affinity after doing cpu hotplug.
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Cc: virtualization@lists.linux-foundation.org
> >> Cc: netdev@vger.kernel.org
> >> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > Thanks for looking into this.
> > Some comments:
> >
> > 1. Looks like the logic in
> > virtnet_set_affinity (and in virtnet_select_queue)
> > will not work very well when CPU IDs are not
> > consequitive. This can happen with hot unplug.
> >
> > Maybe we should add a VQ allocator, and defining
> > a per-cpu variable specifying the VQ instead
> > of using CPU ID.
>
> Yes, and generate the affinity hint based on the mapping. Btw, what does
> VQ allocator means here?
Some logic to generate CPU to VQ mapping.
> >
> >
> > 2. The below code seems racy e.g. when CPU is added
> > during device init.
> >
> > 3. using a global cpu_hotplug seems inelegant.
> > In any case we should document what is the
> > meaning of this variable.
> >
> >> ---
> >> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 38 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index a6fcf15..9710cf4 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -26,6 +26,7 @@
> >> #include <linux/scatterlist.h>
> >> #include <linux/if_vlan.h>
> >> #include <linux/slab.h>
> >> +#include <linux/cpu.h>
> >>
> >> static int napi_weight = 128;
> >> module_param(napi_weight, int, 0444);
> >> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
> >> module_param(csum, bool, 0444);
> >> module_param(gso, bool, 0444);
> >>
> >> +static bool cpu_hotplug = false;
> >> +
> >> /* FIXME: MTU in config. */
> >> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> >> #define GOOD_COPY_LEN 128
> >> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> >> vi->affinity_hint_set = false;
> >> }
> >>
> >> +static int virtnet_cpu_callback(struct notifier_block *nfb,
> >> + unsigned long action, void *hcpu)
> >> +{
> >> + switch(action) {
> >> + case CPU_ONLINE:
> >> + case CPU_ONLINE_FROZEN:
> >> + case CPU_DEAD:
> >> + case CPU_DEAD_FROZEN:
> >> + cpu_hotplug = true;
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> + return NOTIFY_OK;
> >> +}
> >> +
> >> +static struct notifier_block virtnet_cpu_notifier = {
> >> + .notifier_call = virtnet_cpu_callback,
> >> +};
> >> +
> >> static void virtnet_get_ringparam(struct net_device *dev,
> >> struct ethtool_ringparam *ring)
> >> {
> >> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> >> */
> >> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> >> {
> >> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> >> + int txq;
> >> +
> >> + if (unlikely(cpu_hotplug == true)) {
> >> + virtnet_set_affinity(netdev_priv(dev), true);
> >> + cpu_hotplug = false;
> >> + }
> >> +
> >> + txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> >> smp_processor_id();
> >>
> >> while (unlikely(txq >= dev->real_num_tx_queues))
> >> @@ -1248,6 +1278,8 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
> >> {
> >> struct virtio_device *vdev = vi->vdev;
> >>
> >> + unregister_hotcpu_notifier(&virtnet_cpu_notifier);
> >> +
> >> virtnet_set_affinity(vi, false);
> >>
> >> vdev->config->del_vqs(vdev);
> >> @@ -1372,6 +1404,11 @@ static int init_vqs(struct virtnet_info *vi)
> >> goto err_free;
> >>
> >> virtnet_set_affinity(vi, true);
> >> +
> >> + ret = register_hotcpu_notifier(&virtnet_cpu_notifier);
> >> + if (ret)
> >> + goto err_free;
> >> +
> >> return 0;
> >>
> >> err_free:
> >> --
> >> 1.8.0
^ permalink raw reply
* echo 0 > /proc/sys/net/ipv6/conf/lo/disable_ipv6 broken re-investigation
From: Balakumaran Kannan @ 2012-12-27 12:30 UTC (permalink / raw)
To: netdev
Dear All,
Please have a look at the references below.
Ref1: http://markmail.org/message/q2u2eik3pvm6qprg#query:+page:1+mid:q2u2eik3pvm6qprg+state:results
Ref2: http://markmail.org/message/ikfz2iiuyimwp35l#query:+page:1+mid:ikfz2iiuyimwp35l+state:results
In Ref1, "disable_ipv6 for lo broken in 2.6.37-rc4" issue was reported.
In Ref2, this issue seems to have fixed by "reverting 'administrative
down' address handling" commits.
It seems above fix solves only "ping6 ::1" issue, But it has some
other issues like below
1. ping6 to link local addresses of the host fails.
2. ping6 to link local addresses of other hosts fails.
3. It seems link local addresses of other interfaces has become non
functional, so dhcpv6 will fail to assign dynamic IPv6 address because
dhcp6c uses link local address of the interface.
The above observation is on 2.6.35 kernel, but it seems this can be
reproduced in latest kernel versions also (3.6).
Could anybody verify and confirms this issue?
It seems the fix provided in Ref2, adds route only for ::1 but not for
other local addresses (host's own address). Disabling IPv6 of lo
removes all entries for lo from routing table. Enabling IPv6 of lo
doesn't re-creates these entries. So pinging to local addresses of
other interfaces (for example eth0) fails. Also can't able to ping
other machines connected to host. It seems like a total network
communication failure happens after disabling and enabling IPv6 of lo.
Regards,
K.Balakumaran
^ permalink raw reply
* Re: [PATCH 1/2] vhost_net: correct error hanlding in vhost_net_set_backend()
From: Michael S. Tsirkin @ 2012-12-27 13:03 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1356590360-32770-1-git-send-email-jasowang@redhat.com>
On Thu, Dec 27, 2012 at 02:39:20PM +0800, Jason Wang wrote:
> Currently, polling error were ignored in vhost. This may lead some issues (e.g
> kenrel crash when passing a tap fd to vhost before calling TUNSETIFF). Fix this
> by:
>
> - extend the idea of vhost_net_poll_state to all vhost_polls
> - change the state only when polling is succeed
> - make vhost_poll_start() report errors to the caller, which could be used
> caller or userspace.
Maybe it could but this patch just ignores these errors.
And it's not clear how would userspace handle these errors.
Also, since we have a reference on the fd, it would seem
that once poll succeeds it can't fail in the future.
So two other options would make more sense to me:
- if vhost is bound to tun without SETIFF, fail this immediately
- if vhost is bound to tun without SETIFF, defer polling
until SETIFF
Option 1 would seem much easier to implement, I think it's
preferable.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 75 +++++++++++++++++--------------------------------
> drivers/vhost/vhost.c | 16 +++++++++-
> drivers/vhost/vhost.h | 11 ++++++-
> 3 files changed, 50 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 629d6b5..56e7f5a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -64,20 +64,10 @@ enum {
> VHOST_NET_VQ_MAX = 2,
> };
>
> -enum vhost_net_poll_state {
> - VHOST_NET_POLL_DISABLED = 0,
> - VHOST_NET_POLL_STARTED = 1,
> - VHOST_NET_POLL_STOPPED = 2,
> -};
> -
> struct vhost_net {
> struct vhost_dev dev;
> struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
> struct vhost_poll poll[VHOST_NET_VQ_MAX];
> - /* Tells us whether we are polling a socket for TX.
> - * We only do this when socket buffer fills up.
> - * Protected by tx vq lock. */
> - enum vhost_net_poll_state tx_poll_state;
> /* Number of TX recently submitted.
> * Protected by tx vq lock. */
> unsigned tx_packets;
> @@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
> }
> }
>
> -/* Caller must have TX VQ lock */
> -static void tx_poll_stop(struct vhost_net *net)
> -{
> - if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
> - return;
> - vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
> - net->tx_poll_state = VHOST_NET_POLL_STOPPED;
> -}
> -
> -/* Caller must have TX VQ lock */
> -static void tx_poll_start(struct vhost_net *net, struct socket *sock)
> -{
> - if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
> - return;
> - vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> - net->tx_poll_state = VHOST_NET_POLL_STARTED;
> -}
> -
> /* In case of DMA done not in order in lower device driver for some reason.
> * upend_idx is used to track end of used idx, done_idx is used to track head
> * of used idx. Once lower device DMA done contiguously, we will signal KVM
> @@ -252,7 +224,7 @@ static void handle_tx(struct vhost_net *net)
> wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> if (wmem >= sock->sk->sk_sndbuf) {
> mutex_lock(&vq->mutex);
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> mutex_unlock(&vq->mutex);
> return;
> }
> @@ -261,7 +233,7 @@ static void handle_tx(struct vhost_net *net)
> vhost_disable_notify(&net->dev, vq);
>
> if (wmem < sock->sk->sk_sndbuf / 2)
> - tx_poll_stop(net);
> + vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
> hdr_size = vq->vhost_hlen;
> zcopy = vq->ubufs;
>
> @@ -283,7 +255,8 @@ static void handle_tx(struct vhost_net *net)
>
> wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> break;
> }
> @@ -294,7 +267,8 @@ static void handle_tx(struct vhost_net *net)
> (vq->upend_idx - vq->done_idx) :
> (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
> if (unlikely(num_pends > VHOST_MAX_PEND)) {
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> break;
> }
> @@ -360,7 +334,8 @@ static void handle_tx(struct vhost_net *net)
> }
> vhost_discard_vq_desc(vq, 1);
> if (err == -EAGAIN || err == -ENOBUFS)
> - tx_poll_start(net, sock);
> + vhost_poll_start(net->poll + VHOST_NET_VQ_TX,
> + sock->file);
> break;
> }
> if (err != len)
> @@ -623,7 +598,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>
> vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
> vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
> - n->tx_poll_state = VHOST_NET_POLL_DISABLED;
>
> f->private_data = n;
>
> @@ -635,27 +609,26 @@ static void vhost_net_disable_vq(struct vhost_net *n,
> {
> if (!vq->private_data)
> return;
> - if (vq == n->vqs + VHOST_NET_VQ_TX) {
> - tx_poll_stop(n);
> - n->tx_poll_state = VHOST_NET_POLL_DISABLED;
> - } else
> + if (vq == n->vqs + VHOST_NET_VQ_TX)
> + vhost_poll_stop(n->poll + VHOST_NET_VQ_TX);
> + else
> vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
> }
>
> -static void vhost_net_enable_vq(struct vhost_net *n,
> - struct vhost_virtqueue *vq)
> +static int vhost_net_enable_vq(struct vhost_net *n,
> + struct vhost_virtqueue *vq)
> {
> + int err, index = vq - n->vqs;
> struct socket *sock;
>
> sock = rcu_dereference_protected(vq->private_data,
> lockdep_is_held(&vq->mutex));
> if (!sock)
> - return;
> - if (vq == n->vqs + VHOST_NET_VQ_TX) {
> - n->tx_poll_state = VHOST_NET_POLL_STOPPED;
> - tx_poll_start(n, sock);
> - } else
> - vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
> + return 0;
> +
> + n->poll[index].state = VHOST_POLL_STOPPED;
> + err = vhost_poll_start(n->poll + index, sock->file);
> + return err;
> }
>
> static struct socket *vhost_net_stop_vq(struct vhost_net *n,
> @@ -831,12 +804,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> vq->ubufs = ubufs;
> vhost_net_disable_vq(n, vq);
> rcu_assign_pointer(vq->private_data, sock);
> - vhost_net_enable_vq(n, vq);
> + r = vhost_net_enable_vq(n, vq);
> + if (r) {
> + sock = NULL;
> + goto err_enable;
> + }
>
> r = vhost_init_used(vq);
> if (r) {
> sock = NULL;
> - goto err_used;
> + goto err_enable;
> }
>
> n->tx_packets = 0;
> @@ -861,7 +838,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> mutex_unlock(&n->dev.mutex);
> return 0;
>
> -err_used:
> +err_enable:
> if (oldubufs)
> vhost_ubuf_put_and_wait(oldubufs);
> if (oldsock)
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 34389f7..1cb2604 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -77,26 +77,36 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> init_poll_funcptr(&poll->table, vhost_poll_func);
> poll->mask = mask;
> poll->dev = dev;
> + poll->state = VHOST_POLL_DISABLED;
>
> vhost_work_init(&poll->work, fn);
> }
>
> /* Start polling a file. We add ourselves to file's wait queue. The caller must
> * keep a reference to a file until after vhost_poll_stop is called. */
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file)
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file)
> {
> unsigned long mask;
> + if (unlikely(poll->state != VHOST_POLL_STOPPED))
> + return 0;
>
> mask = file->f_op->poll(file, &poll->table);
> + if (mask & POLLERR)
> + return -EINVAL;
> if (mask)
> vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
> + poll->state = VHOST_POLL_STARTED;
> + return 0;
> }
>
Hmm, interesting. I note that tun has this:
if (tun->dev->reg_state != NETREG_REGISTERED)
mask = POLLERR;
So apparently we sometimes return POLLERR when poll
did succeed, then test below wouldn't remove
from wqh in this case. Maybe it's a bug in tun,
need to look into this.
> /* Stop polling a file. After this function returns, it becomes safe to drop the
> * file reference. You must also flush afterwards. */
> void vhost_poll_stop(struct vhost_poll *poll)
> {
> + if (likely(poll->state != VHOST_POLL_STARTED))
> + return;
> remove_wait_queue(poll->wqh, &poll->wait);
> + poll->state = VHOST_POLL_STOPPED;
> }
>
> static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
> @@ -791,8 +801,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
> if (filep)
> fput(filep);
>
> - if (pollstart && vq->handle_kick)
> + if (pollstart && vq->handle_kick) {
> + vq->poll.state = VHOST_POLL_STOPPED;
> vhost_poll_start(&vq->poll, vq->kick);
> + }
>
> mutex_unlock(&vq->mutex);
>
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 2639c58..98861d9 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -26,6 +26,12 @@ struct vhost_work {
> unsigned done_seq;
> };
>
> +enum vhost_poll_state {
> + VHOST_POLL_DISABLED = 0,
> + VHOST_POLL_STARTED = 1,
> + VHOST_POLL_STOPPED = 2,
> +};
> +
> /* Poll a file (eventfd or socket) */
> /* Note: there's nothing vhost specific about this structure. */
> struct vhost_poll {
> @@ -35,6 +41,9 @@ struct vhost_poll {
> struct vhost_work work;
> unsigned long mask;
> struct vhost_dev *dev;
> + /* Tells us whether we are polling a file.
> + * Protected by tx vq lock. */
tx vq lock does not make sense in this context.
> + enum vhost_poll_state state;
> };
>
> void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
> @@ -42,7 +51,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
>
> void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> unsigned long mask, struct vhost_dev *dev);
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file);
> void vhost_poll_stop(struct vhost_poll *poll);
> void vhost_poll_flush(struct vhost_poll *poll);
> void vhost_poll_queue(struct vhost_poll *poll);
> --
> 1.7.1
^ permalink raw reply
* Re: [PATCH 1/2] vhost_net: correct error hanlding in vhost_net_set_backend()
From: Michael S. Tsirkin @ 2012-12-27 13:14 UTC (permalink / raw)
To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <1356590360-32770-1-git-send-email-jasowang@redhat.com>
On Thu, Dec 27, 2012 at 02:39:19PM +0800, Jason Wang wrote:
> Fix the leaking of oldubufs and fd refcnt when fail to initialized used ring.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 14 +++++++++++---
> 1 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ebd08b2..629d6b5 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -834,8 +834,10 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> vhost_net_enable_vq(n, vq);
>
> r = vhost_init_used(vq);
> - if (r)
> - goto err_vq;
> + if (r) {
> + sock = NULL;
> + goto err_used;
> + }
>
> n->tx_packets = 0;
> n->tx_zcopy_err = 0;
> @@ -859,8 +861,14 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> mutex_unlock(&n->dev.mutex);
> return 0;
>
> +err_used:
> + if (oldubufs)
> + vhost_ubuf_put_and_wait(oldubufs);
> + if (oldsock)
> + fput(oldsock->file);
> err_ubufs:
> - fput(sock->file);
> + if (sock)
> + fput(sock->file);
> err_vq:
> mutex_unlock(&vq->mutex);
> err:
I think it's a real bug, but I don't see how the fix
makes sense.
We are returning an error, so we ideally
revert to the state before the faulty
operation. So this should put sock and ubufs,
not oldsock/oldubufs.
The best way is probably to change
vhost_init_used so that it gets private data
pointer as a parameter.
We can then call it before ubuf alloc.
You can then add err_used right after err_ubufs
with no extra logic.
--
MST
^ permalink raw reply
* netlink NLM_F_DUMP for unsupported address family / PF_UNSPEC
From: Timo Teras @ 2012-12-27 15:19 UTC (permalink / raw)
To: netdev
It seems that currently PF_UNSPEC is overloaded when dumping rtnetlink
things. It is used for two purposes: as wildcard entry to dump
all protocol families, and as fallback for unsupported families.
On my system with IPv[46] only and no IPX, running "ip -f ipx
route" would print the IPv4 and IPv6 routes instead of "unsupported" or
"not implemented" error which is rather confusing and unexpected.
Just removing the fallback from rtnl_get_dumpit() does not sound right
since some commands seem to rely on this behaviour e.g. RTM_GETQDISC.
Perhaps rtnl_dump_all should check that request family truly was
PF_UNSPEC or error out if not?
- Timo
^ permalink raw reply
* Re: Linux kernel handling of IPv6 temporary addresses
From: George Kargiotakis @ 2012-12-27 15:57 UTC (permalink / raw)
To: netdev
In-Reply-To: <20121114.180824.1930899985436392426.davem@davemloft.net>
Hello all,
I had previously informed this list about the issue of the linux kernel
losing IPv6 privacy extensions by a local LAN attacker.
Recently I've found that there's actually another, more serious in my
opinion, issue that follows the previous one. If the user tries to
disconnect/reconnect the network device/connection for whatever reason
(e.g. thinking he might gain back privacy extensions), then the device
gets IPs from SLAAC that have the "tentative" flag and never loses
that. That means that IPv6 functionality for that device is from then
on completely lost. I haven't been able to bring back the kernel to a
working IPv6 state without a reboot.
This is definitely a DoS situation and it needs fixing.
Here are the steps to reproduce:
== Step 1. Boot Ubuntu 12.10 (kernel 3.5.0-17-generic) ==
ubuntu@ubuntu:~$ ip a ls dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:8b:99:5d brd ff:ff:ff:ff:ff:ff
inet 192.168.1.96/24 brd 192.168.1.255 scope global eth0
inet6 2001:db8:f00:f00:ad1f:9166:93d4:fd6d/64 scope global temporary dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 2001:db8:f00:f00:5054:ff:fe8b:995d/64 scope global dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 fdbb:aaaa:bbbb:cccc:ad1f:9166:93d4:fd6d/64 scope global temporary dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 fdbb:aaaa:bbbb:cccc:5054:ff:fe8b:995d/64 scope global dynamic
valid_lft 86379sec preferred_lft 3579sec
inet6 fe80::5054:ff:fe8b:995d/64 scope link
valid_lft forever preferred_lft forever
ubuntu@ubuntu:~$ sysctl -a | grep use_tempaddr
net.ipv6.conf.all.use_tempaddr = 2
net.ipv6.conf.default.use_tempaddr = 2
net.ipv6.conf.eth0.use_tempaddr = 2
net.ipv6.conf.lo.use_tempaddr = 2
ubuntu@ubuntu:~$ nmcli con status
NAME UUID DEVICES DEFAULT VPN MASTER-PATH
Wired connection 1 923e6729-74a7-4389-9dbd-43ed7db3d1b8 eth0 yes no --
ubuntu@ubuntu:~$ nmcli dev status
DEVICE TYPE STATE
eth0 802-3-ethernet connected
//ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
64 bytes from 2a00:1450:4002:800::100e: icmp_seq=1 ttl=53 time=70.9 ms
--- 2a00:1450:4002:800::100e ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 70.994/70.994/70.994/0.000 ms
# tcpdump -ni eth0 host 2a00:1450:4002:800::100e
17:57:37.784658 IP6 2001:db8:f00:f00:ad1f:9166:93d4:fd6d > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
17:57:37.855257 IP6 2a00:1450:4002:800::100e > 2001:db8:f00:f00:ad1f:9166:93d4:fd6d: ICMP6, echo reply, seq 1, length 64
== Step 2. flood RAs on the LAN ==
$ dmesg | tail
[ 1093.642053] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
[ 1093.642062] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
[ 1093.642065] IPv6: ipv6_create_tempaddr: retry temporary address regeneration
[ 1093.642067] IPv6: ipv6_create_tempaddr: regeneration time exceeded - disabled temporary address support
ubuntu@ubuntu:~$ sysctl -a | grep use_tempaddr
net.ipv6.conf.all.use_tempaddr = 2
net.ipv6.conf.default.use_tempaddr = 2
net.ipv6.conf.eth0.use_tempaddr = -1
net.ipv6.conf.lo.use_tempaddr = 2
//ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
64 bytes from 2a00:1450:4002:800::100e: icmp_seq=1 ttl=53 time=77.5 ms
--- 2a00:1450:4002:800::100e ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 77.568/77.568/77.568/0.000 ms
# tcpdump -ni eth0 host 2a00:1450:4002:800::100e
17:59:38.204173 IP6 2001:db8:f00:f00:5054:ff:fe8b:995d > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
17:59:38.281437 IP6 2a00:1450:4002:800::100e > 2001:db8:f00:f00:5054:ff:fe8b:995d: ICMP6, echo reply, seq 1, length 64
//notice the change of IPv6 address to the one not using privacy extensions even after the flooding has finished long ago.
== Step 3. Disconnect/Reconnect connection ==
// restoring net.ipv6.conf.eth0.use_tempaddr to value '2' makes no difference at all for the rest of the process
# nmcli dev disconnect iface eth0
# nmcli con up uuid 923e6729-74a7-4389-9dbd-43ed7db3d1b8
ubuntu@ubuntu:~$ ip a ls dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:8b:99:5d brd ff:ff:ff:ff:ff:ff
inet 192.168.1.96/24 brd 192.168.1.255 scope global eth0
inet6 2001:db8:f00:f00:5054:ff:fe8b:995d/64 scope global tentative dynamic
valid_lft 86400sec preferred_lft 3600sec
inet6 fdbb:aaaa:bbbb:cccc:5054:ff:fe8b:995d/64 scope global tentative dynamic
valid_lft 86400sec preferred_lft 3600sec
inet6 fe80::5054:ff:fe8b:995d/64 scope link tentative
valid_lft forever preferred_lft forever
//Notice the "tentative" flag of the IPs on the device
//ping6 2a00:1450:4002:800::100e while in another terminal: tcpdump -ni eth0 ip6
ubuntu@ubuntu:~$ ping6 2a00:1450:4002:800::100e -c1
PING 2a00:1450:4002:800::100e(2a00:1450:4002:800::100e) 56 data bytes
^C
--- 2a00:1450:4002:800::100e ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
# tcpdump -ni eth0 host 2a00:1450:4002:800::100e
18:01:45.264194 IP6 ::1 > 2a00:1450:4002:800::100e: ICMP6, echo request, seq 1, length 64
Summary:
Before flooding it uses IP: 2001:db8:f00:f00:ad1f:9166:93d4:fd6d
After flooding it uses IP: 2001:db8:f00:f00:5054:ff:fe8b:995d --> it has lost privacy extensions
After disconnect/reconnect it tries to use IP: ::1 --> it has lost IPv6 connectivity
Best regards,
--
George Kargiotakis
https://void.gr
GPG KeyID: 0xE4F4FFE6
GPG Fingerprint: 9EB8 31BE C618 07CE 1B51 818D 4A0A 1BC8 E4F4 FFE6
^ permalink raw reply
* [PATCH 3.8 1/5 V2] rtlwifi: Fix warning for unchecked pci_map_single() call
From: Larry Finger @ 2012-12-27 16:37 UTC (permalink / raw)
To: linville-2XuSBdqkA4R54TAoqtyWWQ
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, Larry Finger,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1356626252-4058-1-git-send-email-Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
Kernel 3.8 implements checking of all DMA mapping calls and issues
a WARNING for the first it finds that is not checked.
Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
---
drivers/net/wireless/rtlwifi/pci.c | 6 ++++++
1 file changed, 6 insertions(+)
No change for V2.
diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 3deacaf..4261e8e 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -743,6 +743,8 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw)
done:
bufferaddress = (*((dma_addr_t *)skb->cb));
+ if (pci_dma_mapping_error(rtlpci->pdev, bufferaddress))
+ return;
tmp_one = 1;
rtlpriv->cfg->ops->set_desc((u8 *) pdesc, false,
HW_DESC_RXBUFF_ADDR,
@@ -1115,6 +1117,10 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw)
PCI_DMA_FROMDEVICE);
bufferaddress = (*((dma_addr_t *)skb->cb));
+ if (pci_dma_mapping_error(rtlpci->pdev, bufferaddress)) {
+ dev_kfree_skb_any(skb);
+ return 1;
+ }
rtlpriv->cfg->ops->set_desc((u8 *)entry, false,
HW_DESC_RXBUFF_ADDR,
(u8 *)&bufferaddress);
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox