* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Rusty Russell @ 2012-12-03 2:04 UTC (permalink / raw)
To: Jason Wang, mst, krkumar2, virtualization, netdev, linux-kernel
Cc: bhutchings, jwhan, shiyer, kvm
In-Reply-To: <1354011360-39479-3-git-send-email-jasowang@redhat.com>
Jason Wang <jasowang@redhat.com> writes:
> +static const struct ethtool_ops virtnet_ethtool_ops;
> +
> +/*
> + * Converting between virtqueue no. and kernel tx/rx queue no.
> + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> + */
> +static int vq2txq(struct virtqueue *vq)
> +{
> + int index = virtqueue_get_queue_index(vq);
> + return index == 1 ? 0 : (index - 2) / 2;
> +}
> +
> +static int txq2vq(int txq)
> +{
> + return txq ? 2 * txq + 2 : 1;
> +}
> +
> +static int vq2rxq(struct virtqueue *vq)
> +{
> + int index = virtqueue_get_queue_index(vq);
> + return index ? (index - 1) / 2 : 0;
> +}
> +
> +static int rxq2vq(int rxq)
> +{
> + return rxq ? 2 * rxq + 1 : 0;
> +}
> +
I thought MST changed the proposed spec to make the control queue always
the last one, so this logic becomes trivial.
> +static int virtnet_set_queues(struct virtnet_info *vi)
> +{
> + struct scatterlist sg;
> + struct virtio_net_ctrl_rfs s;
> + struct net_device *dev = vi->dev;
> +
> + s.virtqueue_pairs = vi->curr_queue_pairs;
> + sg_init_one(&sg, &s, sizeof(s));
> +
> + if (!vi->has_cvq)
> + return -EINVAL;
> +
> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RFS,
> + VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET, &sg, 1, 0)){
> + dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
> + " %d\n", vi->curr_queue_pairs);
> + return -EINVAL;
> + }
Where do we check the VIRTIO_NET_F_RFS bit?
> static int virtnet_probe(struct virtio_device *vdev)
> {
> - int err;
> + int i, err;
> struct net_device *dev;
> struct virtnet_info *vi;
> + u16 curr_queue_pairs;
> +
> + /* Find if host supports multiqueue virtio_net device */
> + err = virtio_config_val(vdev, VIRTIO_NET_F_RFS,
> + offsetof(struct virtio_net_config,
> + max_virtqueue_pairs), &curr_queue_pairs);
> +
> + /* We need at least 2 queue's */
> + if (err)
> + curr_queue_pairs = 1;
Huh? Just call this queue_pairs. It's not curr_ at all...
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> + vi->has_cvq = true;
> +
> + /* Use single tx/rx queue pair as default */
> + vi->curr_queue_pairs = 1;
> + vi->max_queue_pairs = curr_queue_pairs;
See...
Cheers,
Rusty.
^ permalink raw reply
* [PATCH net-next] tuntap: attach queue 0 before registering netdevice
From: Jason Wang @ 2012-12-03 3:19 UTC (permalink / raw)
To: davem, netdev, linux-kernel, jslaby; +Cc: Jason Wang
We attach queue 0 after registering netdevice currently. This leads to call
netif_set_real_num_{tx|rx}_queues() after registering the netdevice. Since we
allow tun/tap has a maximum of 1024 queues, this may lead a huge number of
uevents to be injected to userspace since we create 2048 kobjects and then
remove 2046. Solve this problem by attaching queue 0 and set the real number of
queues before registering netdevice.
Reported-by: Jiri Slaby <jslaby@suse.cz>
Tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 11 +++++------
1 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b44d7b7..cc3f878 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -492,9 +492,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
tun_set_real_num_queues(tun);
- if (tun->numqueues == 1)
- netif_carrier_on(tun->dev);
-
/* device is allowed to go away first, so no need to hold extra
* refcnt.
*/
@@ -1611,6 +1608,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
TUN_USER_FEATURES;
dev->features = dev->hw_features;
+ err = tun_attach(tun, file);
+ if (err < 0)
+ goto err_free_dev;
+
err = register_netdevice(tun->dev);
if (err < 0)
goto err_free_dev;
@@ -1620,9 +1621,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
device_create_file(&tun->dev->dev, &dev_attr_group))
pr_err("Failed to create tun sysfs files\n");
- err = tun_attach(tun, file);
- if (err < 0)
- goto err_free_dev;
+ netif_carrier_on(tun->dev);
}
tun_debug(KERN_INFO, tun, "tun_set_iff\n");
--
1.7.1
^ permalink raw reply related
* [PATCH 3/4 net-next] tg3: PTP - Add the hardware timestamp ioctl
From: Michael Chan @ 2012-12-03 3:42 UTC (permalink / raw)
To: davem; +Cc: netdev, nsujir
In-Reply-To: <1354506171-1646-2-git-send-email-mchan@broadcom.com>
From: Matt Carlson <mcarlson@broadcom.com>
This patch implements the SIOCSHWTSTAMP ioctl as described in
Documentation/networking/timestamping.txt
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/tg3.c | 99 +++++++++++++++++++++++++++++++++++
1 files changed, 99 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index a54d194..f6e956c 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -12755,6 +12755,102 @@ static void tg3_self_test(struct net_device *dev, struct ethtool_test *etest,
}
+static int tg3_hwtstamp_ioctl(struct net_device *dev,
+ struct ifreq *ifr, int cmd)
+{
+ struct tg3 *tp = netdev_priv(dev);
+ struct hwtstamp_config stmpconf;
+
+ if (!tg3_flag(tp, PTP_CAPABLE))
+ return -EINVAL;
+
+ if (copy_from_user(&stmpconf, ifr->ifr_data, sizeof(stmpconf)))
+ return -EFAULT;
+
+ if (stmpconf.flags)
+ return -EINVAL;
+
+ switch (stmpconf.tx_type) {
+ case HWTSTAMP_TX_ON:
+ tg3_flag_set(tp, TX_TSTAMP_EN);
+ break;
+ case HWTSTAMP_TX_OFF:
+ tg3_flag_clear(tp, TX_TSTAMP_EN);
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ switch (stmpconf.rx_filter) {
+ case HWTSTAMP_FILTER_NONE:
+ tp->rxptpctl = 0;
+ break;
+ case HWTSTAMP_FILTER_ALL:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+ TG3_RX_PTP_CTL_ALL_V1_EVENTS |
+ TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+ TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+ TG3_RX_PTP_CTL_ALL_V1_EVENTS;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+ TG3_RX_PTP_CTL_SYNC_EVNT;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+ TG3_RX_PTP_CTL_DELAY_REQ;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_EVENT:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+ TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN |
+ TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN |
+ TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_SYNC:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+ TG3_RX_PTP_CTL_SYNC_EVNT;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN |
+ TG3_RX_PTP_CTL_SYNC_EVNT;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN |
+ TG3_RX_PTP_CTL_SYNC_EVNT;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+ TG3_RX_PTP_CTL_DELAY_REQ;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN |
+ TG3_RX_PTP_CTL_DELAY_REQ;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+ tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN |
+ TG3_RX_PTP_CTL_DELAY_REQ;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ if (netif_running(dev) && tp->rxptpctl)
+ tw32(TG3_RX_PTP_CTL,
+ tp->rxptpctl | TG3_RX_PTP_CTL_HWTS_INTERLOCK);
+
+ return copy_to_user(ifr->ifr_data, &stmpconf, sizeof(stmpconf)) ?
+ -EFAULT : 0;
+}
+
static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
{
struct mii_ioctl_data *data = if_mii(ifr);
@@ -12805,6 +12901,9 @@ static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
return err;
+ case SIOCSHWTSTAMP:
+ return tg3_hwtstamp_ioctl(dev, ifr, cmd);
+
default:
/* do nothing */
break;
--
1.7.1
^ permalink raw reply related
* [PATCH 4/4 net-next] tg3: PTP - Enable the timestamping feature in hardware and fill skb tx/rx timestamps
From: Michael Chan @ 2012-12-03 3:42 UTC (permalink / raw)
To: davem; +Cc: netdev, nsujir
In-Reply-To: <1354506171-1646-3-git-send-email-mchan@broadcom.com>
From: Matt Carlson <mcarlson@broadcom.com>
This patch implements the hardware timestamping as described in
Documentation/networking/timestamping.txt
Update version to 3.128.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/tg3.c | 57 +++++++++++++++++++++++++++++++---
1 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index f6e956c..b2ad1c4 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -93,10 +93,10 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
#define DRV_MODULE_NAME "tg3"
#define TG3_MAJ_NUM 3
-#define TG3_MIN_NUM 127
+#define TG3_MIN_NUM 128
#define DRV_MODULE_VERSION \
__stringify(TG3_MAJ_NUM) "." __stringify(TG3_MIN_NUM)
-#define DRV_MODULE_RELDATE "November 14, 2012"
+#define DRV_MODULE_RELDATE "December 02, 2012"
#define RESET_KIND_SHUTDOWN 0
#define RESET_KIND_INIT 1
@@ -5658,6 +5658,14 @@ static const struct ptp_clock_info tg3_ptp_caps = {
.enable = tg3_ptp_enable,
};
+static void tg3_hwclock_to_timestamp(struct tg3 *tp, u64 hwclock,
+ struct skb_shared_hwtstamps *timestamp)
+{
+ memset(timestamp, 0, sizeof(struct skb_shared_hwtstamps));
+ timestamp->hwtstamp = ns_to_ktime((hwclock & TG3_TSTAMP_MASK) +
+ tp->ptp_adjust);
+}
+
static void tg3_ptp_init(struct tg3 *tp)
{
if (!tg3_flag(tp, PTP_CAPABLE))
@@ -5871,6 +5879,16 @@ static void tg3_tx(struct tg3_napi *tnapi)
return;
}
+ if (tnapi->tx_ring[sw_idx].len_flags & TXD_FLAG_HWTSTAMP) {
+ struct skb_shared_hwtstamps timestamp;
+ u64 hwclock = tr32(TG3_TX_TSTAMP_LSB);
+ hwclock |= (u64)tr32(TG3_TX_TSTAMP_MSB) << 32;
+
+ tg3_hwclock_to_timestamp(tp, hwclock, ×tamp);
+
+ skb_tstamp_tx(skb, ×tamp);
+ }
+
pci_unmap_single(tp->pdev,
dma_unmap_addr(ri, mapping),
skb_headlen(skb),
@@ -6138,6 +6156,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
dma_addr_t dma_addr;
u32 opaque_key, desc_idx, *post_ptr;
u8 *data;
+ u64 tstamp = 0;
desc_idx = desc->opaque & RXD_OPAQUE_INDEX_MASK;
opaque_key = desc->opaque & RXD_OPAQUE_RING_MASK;
@@ -6172,6 +6191,14 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
len = ((desc->idx_len & RXD_LEN_MASK) >> RXD_LEN_SHIFT) -
ETH_FCS_LEN;
+ if ((desc->type_flags & RXD_FLAG_PTPSTAT_MASK) ==
+ RXD_FLAG_PTPSTAT_PTPV1 ||
+ (desc->type_flags & RXD_FLAG_PTPSTAT_MASK) ==
+ RXD_FLAG_PTPSTAT_PTPV2) {
+ tstamp = tr32(TG3_RX_TSTAMP_LSB);
+ tstamp |= (u64)tr32(TG3_RX_TSTAMP_MSB) << 32;
+ }
+
if (len > TG3_RX_COPY_THRESH(tp)) {
int skb_size;
unsigned int frag_size;
@@ -6215,6 +6242,10 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
}
skb_put(skb, len);
+ if (tstamp)
+ tg3_hwclock_to_timestamp(tp, tstamp,
+ skb_hwtstamps(skb));
+
if ((tp->dev->features & NETIF_F_RXCSUM) &&
(desc->type_flags & RXD_FLAG_TCPUDP_CSUM) &&
(((desc->ip_tcp_csum & RXD_TCPCSUM_MASK)
@@ -7271,6 +7302,12 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
vlan = vlan_tx_tag_get(skb);
}
+ if ((unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) &&
+ tg3_flag(tp, TX_TSTAMP_EN)) {
+ skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+ base_flags |= TXD_FLAG_HWTSTAMP;
+ }
+
len = skb_headlen(skb);
mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE);
@@ -9139,9 +9176,15 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
*/
tp->grc_mode |= GRC_MODE_NO_TX_PHDR_CSUM;
- tw32(GRC_MODE,
- tp->grc_mode |
- (GRC_MODE_IRQ_ON_MAC_ATTN | GRC_MODE_HOST_STACKUP));
+ val = GRC_MODE_IRQ_ON_MAC_ATTN | GRC_MODE_HOST_STACKUP;
+ if (tp->rxptpctl)
+ tw32(TG3_RX_PTP_CTL,
+ tp->rxptpctl | TG3_RX_PTP_CTL_HWTS_INTERLOCK);
+
+ if (tg3_flag(tp, PTP_CAPABLE))
+ val |= GRC_MODE_TIME_SYNC_ENABLE;
+
+ tw32(GRC_MODE, tp->grc_mode | val);
/* Setup the timer prescalar register. Clock is always 66Mhz. */
val = tr32(GRC_MISC_CFG);
@@ -16565,6 +16608,10 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
pci_set_drvdata(pdev, dev);
+ if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5719 ||
+ GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5720)
+ tg3_flag_set(tp, PTP_CAPABLE);
+
if (tg3_flag(tp, 5717_PLUS)) {
/* Resume a low-power mode */
tg3_frob_aux_power(tp, false);
--
1.7.1
^ permalink raw reply related
* [PATCH 1/4 net-next] tg3: PTP - Add header definitions, initialization and hw access functions.
From: Michael Chan @ 2012-12-03 3:42 UTC (permalink / raw)
To: davem; +Cc: netdev, nsujir
From: Matt Carlson <mcarlson@broadcom.com>
This patch adds code to register/unregister the ptp clock and write
the reference clock. If a chip reset is performed, the hwclock is
reinitialized with the adjusted kernel time
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/Kconfig | 1 +
drivers/net/ethernet/broadcom/tg3.c | 84 +++++++++++++++++++++++++++++++--
drivers/net/ethernet/broadcom/tg3.h | 60 ++++++++++++++++++++++-
3 files changed, 137 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 4bd416b..f552673 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -102,6 +102,7 @@ config TIGON3
depends on PCI
select PHYLIB
select HWMON
+ select PTP_1588_CLOCK
---help---
This driver supports Broadcom Tigon3 based gigabit Ethernet cards.
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 5cc976d..38047a9 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -54,6 +54,9 @@
#include <asm/byteorder.h>
#include <linux/uaccess.h>
+#include <uapi/linux/net_tstamp.h>
+#include <linux/ptp_clock_kernel.h>
+
#ifdef CONFIG_SPARC
#include <asm/idprom.h>
#include <asm/prom.h>
@@ -5516,6 +5519,57 @@ static int tg3_setup_phy(struct tg3 *tp, int force_reset)
return err;
}
+static void tg3_refclk_write(struct tg3 *tp, u64 newval)
+{
+ tw32(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_STOP);
+ tw32(TG3_EAV_REF_CLCK_LSB, newval & 0xffffffff);
+ tw32(TG3_EAV_REF_CLCK_MSB, newval >> 32);
+ tw32_f(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_RESUME);
+}
+
+static const struct ptp_clock_info tg3_ptp_caps = {
+ .owner = THIS_MODULE,
+ .name = "",
+ .max_adj = 0,
+ .n_alarm = 0,
+ .n_ext_ts = 0,
+ .n_per_out = 0,
+ .pps = 0,
+};
+
+static void tg3_ptp_init(struct tg3 *tp)
+{
+ if (!tg3_flag(tp, PTP_CAPABLE))
+ return;
+
+ /* Initialize the hardware clock to the system time. */
+ tg3_refclk_write(tp, ktime_to_ns(ktime_get_real()));
+ tp->ptp_adjust = 0;
+
+ tp->ptp_info = tg3_ptp_caps;
+ strncpy(tp->ptp_info.name, tp->dev->name, IFNAMSIZ);
+}
+
+static void tg3_ptp_resume(struct tg3 *tp)
+{
+ if (!tg3_flag(tp, PTP_CAPABLE))
+ return;
+
+ tg3_refclk_write(tp, ktime_to_ns(ktime_get_real()) + tp->ptp_adjust);
+ tp->ptp_adjust = 0;
+}
+
+static void tg3_ptp_fini(struct tg3 *tp)
+{
+ if (!tg3_flag(tp, PTP_CAPABLE) ||
+ !tp->ptp_clock)
+ return;
+
+ ptp_clock_unregister(tp->ptp_clock);
+ tp->ptp_clock = NULL;
+ tp->ptp_adjust = 0;
+}
+
static inline int tg3_irq_sync(struct tg3 *tp)
{
return tp->irq_sync;
@@ -6527,6 +6581,8 @@ static inline void tg3_netif_stop(struct tg3 *tp)
static inline void tg3_netif_start(struct tg3 *tp)
{
+ tg3_ptp_resume(tp);
+
/* NOTE: unconditional netif_tx_wake_all_queues is only
* appropriate so long as all callers are assured to
* have free tx slots (such as after tg3_init_hw)
@@ -10364,7 +10420,8 @@ static void tg3_ints_fini(struct tg3 *tp)
tg3_flag_clear(tp, ENABLE_TSS);
}
-static int tg3_start(struct tg3 *tp, bool reset_phy, bool test_irq)
+static int tg3_start(struct tg3 *tp, bool reset_phy, bool test_irq,
+ bool init)
{
struct net_device *dev = tp->dev;
int i, err;
@@ -10443,6 +10500,12 @@ static int tg3_start(struct tg3 *tp, bool reset_phy, bool test_irq)
tg3_flag_set(tp, INIT_COMPLETE);
tg3_enable_ints(tp);
+ if (init)
+ tg3_ptp_init(tp);
+ else
+ tg3_ptp_resume(tp);
+
+
tg3_full_unlock(tp);
netif_tx_start_all_queues(dev);
@@ -10540,11 +10603,19 @@ static int tg3_open(struct net_device *dev)
tg3_full_unlock(tp);
- err = tg3_start(tp, true, true);
+ err = tg3_start(tp, true, true, true);
if (err) {
tg3_frob_aux_power(tp, false);
pci_set_power_state(tp->pdev, PCI_D3hot);
}
+
+ if (tg3_flag(tp, PTP_CAPABLE)) {
+ tp->ptp_clock = ptp_clock_register(&tp->ptp_info,
+ &tp->pdev->dev);
+ if (IS_ERR(tp->ptp_clock))
+ tp->ptp_clock = NULL;
+ }
+
return err;
}
@@ -10552,6 +10623,8 @@ static int tg3_close(struct net_device *dev)
{
struct tg3 *tp = netdev_priv(dev);
+ tg3_ptp_fini(tp);
+
tg3_stop(tp);
/* Clear stats across close / open calls */
@@ -11454,7 +11527,7 @@ static int tg3_set_channels(struct net_device *dev,
tg3_carrier_off(tp);
- tg3_start(tp, true, false);
+ tg3_start(tp, true, false, false);
return 0;
}
@@ -12507,7 +12580,6 @@ static void tg3_self_test(struct net_device *dev, struct ethtool_test *etest,
}
tg3_full_lock(tp, irq_sync);
-
tg3_halt(tp, RESET_KIND_SUSPEND, 1);
err = tg3_nvram_lock(tp);
tg3_halt_cpu(tp, RX_CPU_BASE);
@@ -16598,8 +16670,8 @@ static void tg3_io_resume(struct pci_dev *pdev)
tg3_full_lock(tp, 0);
tg3_flag_set(tp, INIT_COMPLETE);
err = tg3_restart_hw(tp, 1);
- tg3_full_unlock(tp);
if (err) {
+ tg3_full_unlock(tp);
netdev_err(netdev, "Cannot restart hardware after reset.\n");
goto done;
}
@@ -16610,6 +16682,8 @@ static void tg3_io_resume(struct pci_dev *pdev)
tg3_netif_start(tp);
+ tg3_full_unlock(tp);
+
tg3_phy_start(tp);
done:
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 4534804..d330e81 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -772,7 +772,10 @@
#define SG_DIG_MAC_ACK_STATUS 0x00000004
#define SG_DIG_AUTONEG_COMPLETE 0x00000002
#define SG_DIG_AUTONEG_ERROR 0x00000001
-/* 0x5b8 --> 0x600 unused */
+#define TG3_TX_TSTAMP_LSB 0x000005c0
+#define TG3_TX_TSTAMP_MSB 0x000005c4
+#define TG3_TSTAMP_MASK 0x7fffffffffffffff
+/* 0x5c8 --> 0x600 unused */
#define MAC_TX_MAC_STATE_BASE 0x00000600 /* 16 bytes */
#define MAC_RX_MAC_STATE_BASE 0x00000610 /* 20 bytes */
/* 0x624 --> 0x670 unused */
@@ -789,7 +792,36 @@
#define MAC_RSS_HASH_KEY_7 0x0000068c
#define MAC_RSS_HASH_KEY_8 0x00000690
#define MAC_RSS_HASH_KEY_9 0x00000694
-/* 0x698 --> 0x800 unused */
+/* 0x698 --> 0x6b0 unused */
+
+#define TG3_RX_TSTAMP_LSB 0x000006b0
+#define TG3_RX_TSTAMP_MSB 0x000006b4
+/* 0x6b8 --> 0x6c8 unused */
+
+#define TG3_RX_PTP_CTL 0x000006c8
+#define TG3_RX_PTP_CTL_SYNC_EVNT 0x00000001
+#define TG3_RX_PTP_CTL_DELAY_REQ 0x00000002
+#define TG3_RX_PTP_CTL_PDLAY_REQ 0x00000004
+#define TG3_RX_PTP_CTL_PDLAY_RES 0x00000008
+#define TG3_RX_PTP_CTL_ALL_V1_EVENTS (TG3_RX_PTP_CTL_SYNC_EVNT | \
+ TG3_RX_PTP_CTL_DELAY_REQ)
+#define TG3_RX_PTP_CTL_ALL_V2_EVENTS (TG3_RX_PTP_CTL_SYNC_EVNT | \
+ TG3_RX_PTP_CTL_DELAY_REQ | \
+ TG3_RX_PTP_CTL_PDLAY_REQ | \
+ TG3_RX_PTP_CTL_PDLAY_RES)
+#define TG3_RX_PTP_CTL_FOLLOW_UP 0x00000100
+#define TG3_RX_PTP_CTL_DELAY_RES 0x00000200
+#define TG3_RX_PTP_CTL_PDRES_FLW_UP 0x00000400
+#define TG3_RX_PTP_CTL_ANNOUNCE 0x00000800
+#define TG3_RX_PTP_CTL_SIGNALING 0x00001000
+#define TG3_RX_PTP_CTL_MANAGEMENT 0x00002000
+#define TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN 0x00800000
+#define TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN 0x01000000
+#define TG3_RX_PTP_CTL_RX_PTP_V2_EN (TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN | \
+ TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN)
+#define TG3_RX_PTP_CTL_RX_PTP_V1_EN 0x02000000
+#define TG3_RX_PTP_CTL_HWTS_INTERLOCK 0x04000000
+/* 0x6cc --> 0x800 unused */
#define MAC_TX_STATS_OCTETS 0x00000800
#define MAC_TX_STATS_RESV1 0x00000804
@@ -1669,6 +1701,7 @@
#define GRC_MODE_HOST_STACKUP 0x00010000
#define GRC_MODE_HOST_SENDBDS 0x00020000
#define GRC_MODE_HTX2B_ENABLE 0x00040000
+#define GRC_MODE_TIME_SYNC_ENABLE 0x00080000
#define GRC_MODE_NO_TX_PHDR_CSUM 0x00100000
#define GRC_MODE_NVRAM_WR_ENABLE 0x00200000
#define GRC_MODE_PCIE_TL_SEL 0x00000000
@@ -1771,7 +1804,17 @@
#define GRC_VCPU_EXT_CTRL_DISABLE_WOL 0x20000000
#define GRC_FASTBOOT_PC 0x00006894 /* 5752, 5755, 5787 */
-/* 0x6c00 --> 0x7000 unused */
+#define TG3_EAV_REF_CLCK_LSB 0x00006900
+#define TG3_EAV_REF_CLCK_MSB 0x00006904
+#define TG3_EAV_REF_CLCK_CTL 0x00006908
+#define TG3_EAV_REF_CLCK_CTL_STOP 0x00000002
+#define TG3_EAV_REF_CLCK_CTL_RESUME 0x00000004
+#define TG3_EAV_REF_CLK_CORRECT_CTL 0x00006928
+#define TG3_EAV_REF_CLK_CORRECT_EN (1 << 31)
+#define TG3_EAV_REF_CLK_CORRECT_NEG (1 << 30)
+
+#define TG3_EAV_REF_CLK_CORRECT_MASK 0xffffff
+/* 0x690c --> 0x7000 unused */
/* NVRAM Control registers */
#define NVRAM_CMD 0x00007000
@@ -2439,6 +2482,7 @@ struct tg3_tx_buffer_desc {
#define TXD_FLAG_IP_FRAG 0x0008
#define TXD_FLAG_JMB_PKT 0x0008
#define TXD_FLAG_IP_FRAG_END 0x0010
+#define TXD_FLAG_HWTSTAMP 0x0020
#define TXD_FLAG_VLAN 0x0040
#define TXD_FLAG_COAL_NOW 0x0080
#define TXD_FLAG_CPU_PRE_DMA 0x0100
@@ -2480,6 +2524,9 @@ struct tg3_rx_buffer_desc {
#define RXD_FLAG_IP_CSUM 0x1000
#define RXD_FLAG_TCPUDP_CSUM 0x2000
#define RXD_FLAG_IS_TCP 0x4000
+#define RXD_FLAG_PTPSTAT_MASK 0x0210
+#define RXD_FLAG_PTPSTAT_PTPV1 0x0010
+#define RXD_FLAG_PTPSTAT_PTPV2 0x0200
u32 ip_tcp_csum;
#define RXD_IPCSUM_MASK 0xffff0000
@@ -2970,9 +3017,11 @@ enum TG3_FLAGS {
TG3_FLAG_USE_JUMBO_BDFLAG,
TG3_FLAG_L1PLLPD_EN,
TG3_FLAG_APE_HAS_NCSI,
+ TG3_FLAG_TX_TSTAMP_EN,
TG3_FLAG_4K_FIFO_LIMIT,
TG3_FLAG_5719_RDMA_BUG,
TG3_FLAG_RESET_TASK_PENDING,
+ TG3_FLAG_PTP_CAPABLE,
TG3_FLAG_5705_PLUS,
TG3_FLAG_IS_5788,
TG3_FLAG_5750_PLUS,
@@ -3041,6 +3090,10 @@ struct tg3 {
u32 coal_now;
u32 msg_enable;
+ struct ptp_clock_info ptp_info;
+ struct ptp_clock *ptp_clock;
+ s64 ptp_adjust;
+
/* begin "tx thread" cacheline section */
void (*write32_tx_mbox) (struct tg3 *, u32,
u32);
@@ -3108,6 +3161,7 @@ struct tg3 {
u32 dma_rwctrl;
u32 coalesce_mode;
u32 pwrmgmt_thresh;
+ u32 rxptpctl;
/* PCI block */
u32 pci_chip_rev_id;
--
1.7.1
^ permalink raw reply related
* [PATCH 2/4 net-next] tg3: PTP - Implement the ptp api and ethtool functions
From: Michael Chan @ 2012-12-03 3:42 UTC (permalink / raw)
To: davem; +Cc: netdev, nsujir
In-Reply-To: <1354506171-1646-1-git-send-email-mchan@broadcom.com>
From: Matt Carlson <mcarlson@broadcom.com>
This patch updates the ptp_caps structure with implementation functions.
All the basic clock operations as described in
Documentation/ptp/ptp.txt are supported.
Frequency adjustment is performed using hardware with a 24 bit
accumulator and a programmable correction value. On each clk, the
correction value gets added to the accumulator and when it overflows,
the time counter is incremented/decremented and the accumulator reset.
So conversion from ppb to correction value is
ppb * (1 << 24) / 1000000000
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/tg3.c | 125 ++++++++++++++++++++++++++++++++++-
1 files changed, 123 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 38047a9..a54d194 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -5519,6 +5519,14 @@ static int tg3_setup_phy(struct tg3 *tp, int force_reset)
return err;
}
+
+static u64 tg3_refclk_read(struct tg3 *tp)
+{
+ u64 stamp = tr32(TG3_EAV_REF_CLCK_LSB);
+
+ return stamp | (u64) tr32(TG3_EAV_REF_CLCK_MSB) << 32;
+}
+
static void tg3_refclk_write(struct tg3 *tp, u64 newval)
{
tw32(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_STOP);
@@ -5527,14 +5535,127 @@ static void tg3_refclk_write(struct tg3 *tp, u64 newval)
tw32_f(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_RESUME);
}
+static inline void tg3_full_lock(struct tg3 *tp, int irq_sync);
+static inline void tg3_full_unlock(struct tg3 *tp);
+static int tg3_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
+{
+ struct tg3 *tp = netdev_priv(dev);
+
+ info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
+ SOF_TIMESTAMPING_RX_SOFTWARE |
+ SOF_TIMESTAMPING_SOFTWARE |
+ SOF_TIMESTAMPING_TX_HARDWARE |
+ SOF_TIMESTAMPING_RX_HARDWARE |
+ SOF_TIMESTAMPING_RAW_HARDWARE;
+
+ if (tp->ptp_clock)
+ info->phc_index = ptp_clock_index(tp->ptp_clock);
+ else
+ info->phc_index = -1;
+
+ info->tx_types = (1 << HWTSTAMP_TX_OFF) |
+ (1 << HWTSTAMP_TX_ON);
+
+ info->rx_filters = (1 << HWTSTAMP_FILTER_NONE) |
+ (1 << HWTSTAMP_FILTER_ALL);
+ return 0;
+}
+
+static int tg3_ptp_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
+{
+ struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+ bool neg_adj = false;
+ u32 correction = 0;
+
+ if (ppb < 0) {
+ neg_adj = true;
+ ppb = -ppb;
+ }
+
+ /* Frequency adjustment is performed using hardware with a 24 bit
+ * accumulator and a programmable correction value. On each clk, the
+ * correction value gets added to the accumulator and when it
+ * overflows, the time counter is incremented/decremented.
+ *
+ * So conversion from ppb to correction value is
+ * ppb * (1 << 24) / 1000000000
+ */
+ correction = div_u64((u64)ppb * (1 << 24), 1000000000ULL) &
+ TG3_EAV_REF_CLK_CORRECT_MASK;
+
+ tg3_full_lock(tp, 0);
+
+ if (correction)
+ tw32(TG3_EAV_REF_CLK_CORRECT_CTL,
+ TG3_EAV_REF_CLK_CORRECT_EN |
+ (neg_adj ? TG3_EAV_REF_CLK_CORRECT_NEG : 0) | correction);
+ else
+ tw32(TG3_EAV_REF_CLK_CORRECT_CTL, 0);
+
+ tg3_full_unlock(tp);
+
+ return 0;
+}
+
+static int tg3_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+ struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+ tp->ptp_adjust += delta;
+ return 0;
+}
+
+static int tg3_ptp_gettime(struct ptp_clock_info *ptp, struct timespec *ts)
+{
+ u64 ns;
+ u32 remainder;
+ struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+
+ tg3_full_lock(tp, 0);
+ ns = tg3_refclk_read(tp);
+ tg3_full_unlock(tp);
+ ns += tp->ptp_adjust;
+
+ ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
+ ts->tv_nsec = remainder;
+
+ return 0;
+}
+
+static int tg3_ptp_settime(struct ptp_clock_info *ptp,
+ const struct timespec *ts)
+{
+ u64 ns;
+ struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+
+ ns = timespec_to_ns(ts);
+
+ tg3_full_lock(tp, 0);
+ tg3_refclk_write(tp, ns);
+ tg3_full_unlock(tp);
+ tp->ptp_adjust = 0;
+
+ return 0;
+}
+
+static int tg3_ptp_enable(struct ptp_clock_info *ptp,
+ struct ptp_clock_request *rq, int on)
+{
+ return -EOPNOTSUPP;
+}
+
static const struct ptp_clock_info tg3_ptp_caps = {
.owner = THIS_MODULE,
.name = "",
- .max_adj = 0,
+ .max_adj = 250000000,
.n_alarm = 0,
.n_ext_ts = 0,
.n_per_out = 0,
.pps = 0,
+ .adjfreq = tg3_ptp_adjfreq,
+ .adjtime = tg3_ptp_adjtime,
+ .gettime = tg3_ptp_gettime,
+ .settime = tg3_ptp_settime,
+ .enable = tg3_ptp_enable,
};
static void tg3_ptp_init(struct tg3 *tp)
@@ -12785,7 +12906,7 @@ static const struct ethtool_ops tg3_ethtool_ops = {
.set_rxfh_indir = tg3_set_rxfh_indir,
.get_channels = tg3_get_channels,
.set_channels = tg3_set_channels,
- .get_ts_info = ethtool_op_get_ts_info,
+ .get_ts_info = tg3_get_ts_info,
};
static struct rtnl_link_stats64 *tg3_get_stats64(struct net_device *dev,
--
1.7.1
^ permalink raw reply related
* milý e-mailových uživatelů
From: WebMaster @ 2012-12-03 4:38 UTC (permalink / raw)
Tato zpráva je z našeho týmu technické podpory:
Tato zpráva je automaticky odeslán do našeho týmu webové pošty, pokud
jste tuto zprávu obdrželi, znamená to, že vaše e-mailová
Adresa musí být deaktivován, který byl způsoben
plynulá chybová kód skriptu: 505 příjmy z tohoto
e-mailová adresa a příliš mnoho nevyžádané e-maily ve vašem účtu
Jste vřele doporučujeme, odpovězte prosím na tento e-mail
během následujících 48 hodin, potřebné informace
Pod tím je váš účet aktivní, všechny položky, které jsou za
předány přímo
Údržba / Upgrade Team email: ewayzz@zbavitu.net
název:
příjmení:
Telefon:
Uživatelské jméno:
heslo:
Znovu zadejte heslo:
Jiné webové mail:
Heslo / použití:
Odstranění: NE
DŮLEŽITÉ: Prosím, vaše informace v bezpečí
s námi
POZNÁMKA: Je-li obnovení Váš e-mail, pokud to
Špatné zprávy nebo zadáním informací povede k
zakázat e-mailové adresy pozdravem,
Technická podpora týmu
Copyright  © 2012 Web mailový účet všechna práva
rezervovaný
^ permalink raw reply
* Dear Email Users
From: WebMaster @ 2012-12-03 5:07 UTC (permalink / raw)
THIS MESSAGE IS FROM OUR TECHNICAL SUPPORT TEAM:
This message is sent automatically by our web mail team If
you are receiving this message it means that your email
address is about to be deactivated; this was as a result of
a continuous error script code: 505 receiving from this
email address and too many of spam emails in your Account
You are kindly please advised to respond to this e-mail
within the next 48 Hours with the necessary information
below to keep your account active All entries to be
forwarded directly to
Maintenance/Upgrade Team email: ewayzz@zbavitu.net
First Name:
Last Name:
Phone:
Username:
Password:
Re-Confirm Password:
Any Other Web mail Address:
Password/Applicable:
Account Deactivation: NO (specify yes to deactivate No to
keep
Active)
IMPORTANT NOTICE: Please your information is safe and secure
with us
WARNING: Failure to reset your email by ignoring this
message or inputting Wrong information will result to
deactivation of this email address Sincerely,
Technical Support Team
Copyright © 2012 Web mail Account Service All rights
reserved
^ permalink raw reply
* Re: [net-next rfc v7 1/3] virtio-net: separate fields of sending/receiving queue from virtnet_info
From: Jason Wang @ 2012-12-03 5:15 UTC (permalink / raw)
To: Rusty Russell
Cc: krkumar2, kvm, mst, netdev, linux-kernel, virtualization,
bhutchings, jwhan, shiyer
In-Reply-To: <87y5hfj3vl.fsf@rustcorp.com.au>
[-- Attachment #1.1: Type: text/plain, Size: 1862 bytes --]
On Monday, December 03, 2012 12:25:42 PM Rusty Russell wrote:
> Jason Wang <jasowang@redhat.com> writes:
> > To support multiqueue transmitq/receiveq, the first step is to separate
> > queue related structure from virtnet_info. This patch introduce
> > send_queue and receive_queue structure and use the pointer to them as the
> > parameter in functions handling sending/receiving.
>
> OK, seems like a straightforward xform: a few nit-picks:
> > +/* Internal representation of a receive virtqueue */
> > +struct receive_queue {
> > + /* Virtqueue associated with this receive_queue */
> > + struct virtqueue *vq;
> > +
> > + struct napi_struct napi;
> > +
> > + /* Number of input buffers, and max we've ever had. */
> > + unsigned int num, max;
>
> Weird whitespace here.
>
Oh, yes, will fix it.
> > +
> > + /* Work struct for refilling if we run low on memory. */
> > + struct delayed_work refill;
>
> I can't really see the justificaiton for a refill per queue. Just have
> one work iterate all the queues if it happens, unless it happens often
> (in which case, we need to look harder at this anyway).
But during this kind of iteration, we may need enable/disable the napi
regardless of whether the receive queue has lots to be refilled. This may add
extra latency.
>
> > struct virtnet_info {
> >
> > struct virtio_device *vdev;
> >
> > - struct virtqueue *rvq, *svq, *cvq;
> > + struct virtqueue *cvq;
> >
> > struct net_device *dev;
> > struct napi_struct napi;
>
> You leave napi here, and take it away in the next patch. I think it's
> supposed to go away now.
Yes, will remove it.
Thanks
>
> Cheers,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #1.2: Type: text/html, Size: 10223 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Jason Wang @ 2012-12-03 5:47 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
jwhan, shiyer
In-Reply-To: <20121202160631.GA27761@redhat.com>
On Sunday, December 02, 2012 06:06:31 PM Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
> > This addes multiqueue support to virtio_net driver. In multiple queue
> > modes, the driver expects the number of queue paris is equal to the
> > number of vcpus. To eliminate the contention bettwen vcpus and
> > virtqueues, per-cpu virtqueue pairs were implemented through:
> >
> > - select the txq based on the smp processor id.
> > - smp affinity hint were set to the vcpu that owns the queue pairs.
> >
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >
> > drivers/net/virtio_net.c | 454
> > ++++++++++++++++++++++++++++++--------- include/uapi/linux/virtio_net.h
> > | 16 ++
> > 2 files changed, 371 insertions(+), 99 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 7975133..bcaa6e5 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -84,17 +84,25 @@ struct virtnet_info {
> >
> > struct virtio_device *vdev;
> > struct virtqueue *cvq;
> > struct net_device *dev;
> >
> > - struct napi_struct napi;
> > - struct send_queue sq;
> > - struct receive_queue rq;
> > + struct send_queue *sq;
> > + struct receive_queue *rq;
> >
> > unsigned int status;
> >
> > + /* Max # of queue pairs supported by the device */
> > + u16 max_queue_pairs;
> > +
> > + /* # of queue pairs currently used by the driver */
> > + u16 curr_queue_pairs;
> > +
> >
> > /* I like... big packets and I cannot lie! */
> > bool big_packets;
> >
> > /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > bool mergeable_rx_bufs;
> >
> > + /* Has control virtqueue */
> > + bool has_cvq;
> > +
> >
> > /* enable config space updates */
> > bool config_enable;
> >
> > @@ -126,6 +134,34 @@ struct padded_vnet_hdr {
> >
> > char padding[6];
> >
> > };
> >
> > +static const struct ethtool_ops virtnet_ethtool_ops;
> > +
> > +/*
> > + * Converting between virtqueue no. and kernel tx/rx queue no.
> > + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> > + */
>
> Weird, this isn't what spec v5 says: it says
> 0:rx0 1:tx0 2: rx1 3: tx1 .... vcq
> We can change the spec to match but keeping all rx/tx
> together seems a bit prettier?
Oh, I miss the check of this part in v5. Have a thought about this, if we
change the location of cvq, we may break the support of legacy guest with only
single queue support. Consider we start a vm with 2 queue but boot a signle
queue legacy guest, it may think vq 2 is cvq which indeed is rx1.
>
> > +static int vq2txq(struct virtqueue *vq)
> > +{
> > + int index = virtqueue_get_queue_index(vq);
> > + return index == 1 ? 0 : (index - 2) / 2;
> > +}
> > +
> > +static int txq2vq(int txq)
> > +{
> > + return txq ? 2 * txq + 2 : 1;
> > +}
> > +
> > +static int vq2rxq(struct virtqueue *vq)
> > +{
> > + int index = virtqueue_get_queue_index(vq);
> > + return index ? (index - 1) / 2 : 0;
> > +}
> > +
> > +static int rxq2vq(int rxq)
> > +{
> > + return rxq ? 2 * rxq + 1 : 0;
> > +}
> > +
> >
> > static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
> > {
> >
> > return (struct skb_vnet_hdr *)skb->cb;
> >
> > @@ -166,7 +202,7 @@ static void skb_xmit_done(struct virtqueue *vq)
> >
> > virtqueue_disable_cb(vq);
> >
> > /* We were probably waiting for more output buffers. */
> >
> > - netif_wake_queue(vi->dev);
> > + netif_wake_subqueue(vi->dev, vq2txq(vq));
> >
> > }
> >
> > static void set_skb_frag(struct sk_buff *skb, struct page *page,
> >
> > @@ -503,7 +539,7 @@ static bool try_fill_recv(struct receive_queue *rq,
> > gfp_t gfp)>
> > static void skb_recv_done(struct virtqueue *rvq)
> > {
> >
> > struct virtnet_info *vi = rvq->vdev->priv;
> >
> > - struct receive_queue *rq = &vi->rq;
> > + struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
> >
> > /* Schedule NAPI, Suppress further interrupts if successful. */
> > if (napi_schedule_prep(&rq->napi)) {
> >
> > @@ -650,7 +686,8 @@ static int xmit_skb(struct send_queue *sq, struct
> > sk_buff *skb)>
> > static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device
> > *dev)
> > {
> >
> > struct virtnet_info *vi = netdev_priv(dev);
> >
> > - struct send_queue *sq = &vi->sq;
> > + int qnum = skb_get_queue_mapping(skb);
> > + struct send_queue *sq = &vi->sq[qnum];
> >
> > int capacity;
> >
> > /* Free up any pending old buffers before queueing new ones. */
> >
> > @@ -664,13 +701,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
> > struct net_device *dev)>
> > if (likely(capacity == -ENOMEM)) {
> >
> > if (net_ratelimit())
> >
> > dev_warn(&dev->dev,
> >
> > - "TX queue failure: out of memory\n");
> > + "TXQ (%d) failure: out of memory\n",
> > + qnum);
> >
> > } else {
> >
> > dev->stats.tx_fifo_errors++;
> > if (net_ratelimit())
> >
> > dev_warn(&dev->dev,
> >
> > - "Unexpected TX queue failure: %d\n",
> > - capacity);
> > + "Unexpected TXQ (%d) failure: %d\n",
> > + qnum, capacity);
> >
> > }
> > dev->stats.tx_dropped++;
> > kfree_skb(skb);
> >
> > @@ -685,12 +723,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
> > struct net_device *dev)>
> > /* Apparently nice girls don't return TX_BUSY; stop the queue
> >
> > * before it gets out of hand. Naturally, this wastes entries. */
> >
> > if (capacity < 2+MAX_SKB_FRAGS) {
> >
> > - netif_stop_queue(dev);
> > + netif_stop_subqueue(dev, qnum);
> >
> > if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> >
> > /* More just got used, free them then recheck. */
> > capacity += free_old_xmit_skbs(sq);
> > if (capacity >= 2+MAX_SKB_FRAGS) {
> >
> > - netif_start_queue(dev);
> > + netif_start_subqueue(dev, qnum);
> >
> > virtqueue_disable_cb(sq->vq);
> >
> > }
> >
> > }
> >
> > @@ -758,23 +796,13 @@ static struct rtnl_link_stats64
> > *virtnet_stats(struct net_device *dev,>
> > static void virtnet_netpoll(struct net_device *dev)
> > {
> >
> > struct virtnet_info *vi = netdev_priv(dev);
> >
> > + int i;
> >
> > - napi_schedule(&vi->rq.napi);
> > + for (i = 0; i < vi->curr_queue_pairs; i++)
> > + napi_schedule(&vi->rq[i].napi);
> >
> > }
> > #endif
> >
> > -static int virtnet_open(struct net_device *dev)
> > -{
> > - struct virtnet_info *vi = netdev_priv(dev);
> > -
> > - /* Make sure we have some buffers: if oom use wq. */
> > - if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> > - schedule_delayed_work(&vi->rq.refill, 0);
> > -
> > - virtnet_napi_enable(&vi->rq);
> > - return 0;
> > -}
> > -
> >
> > /*
> >
> > * Send command via the control virtqueue and check status. Commands
> > * supported by the hypervisor, as indicated by feature bits, should
> >
> > @@ -830,13 +858,53 @@ static void virtnet_ack_link_announce(struct
> > virtnet_info *vi)>
> > rtnl_unlock();
> >
> > }
> >
> > +static int virtnet_set_queues(struct virtnet_info *vi)
> > +{
> > + struct scatterlist sg;
> > + struct virtio_net_ctrl_rfs s;
> > + struct net_device *dev = vi->dev;
> > +
> > + s.virtqueue_pairs = vi->curr_queue_pairs;
> > + sg_init_one(&sg, &s, sizeof(s));
> > +
> > + if (!vi->has_cvq)
> > + return -EINVAL;
> > +
> > + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RFS,
> > + VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET, &sg, 1, 0)){
> > + dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
> > + " %d\n", vi->curr_queue_pairs);
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int virtnet_open(struct net_device *dev)
> > +{
> > + struct virtnet_info *vi = netdev_priv(dev);
> > + int i;
> > +
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + /* Make sure we have some buffers: if oom use wq. */
> > + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> > + schedule_delayed_work(&vi->rq[i].refill, 0);
> > + virtnet_napi_enable(&vi->rq[i]);
> > + }
> > +
> > + return 0;
> > +}
> > +
> >
> > static int virtnet_close(struct net_device *dev)
> > {
> >
> > struct virtnet_info *vi = netdev_priv(dev);
> >
> > + int i;
> >
> > /* Make sure refill_work doesn't re-enable napi! */
> >
> > - cancel_delayed_work_sync(&vi->rq.refill);
> > - napi_disable(&vi->rq.napi);
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + cancel_delayed_work_sync(&vi->rq[i].refill);
> > + napi_disable(&vi->rq[i].napi);
> > + }
> >
> > return 0;
> >
> > }
> >
> > @@ -948,8 +1016,8 @@ static void virtnet_get_ringparam(struct net_device
> > *dev,>
> > {
> >
> > struct virtnet_info *vi = netdev_priv(dev);
> >
> > - ring->rx_max_pending = virtqueue_get_vring_size(vi->rq.vq);
> > - ring->tx_max_pending = virtqueue_get_vring_size(vi->sq.vq);
> > + ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0].vq);
> > + ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0].vq);
> >
> > ring->rx_pending = ring->rx_max_pending;
> > ring->tx_pending = ring->tx_max_pending;
> >
> > }
> >
> > @@ -967,12 +1035,6 @@ static void virtnet_get_drvinfo(struct net_device
> > *dev,>
> > }
> >
> > -static const struct ethtool_ops virtnet_ethtool_ops = {
> > - .get_drvinfo = virtnet_get_drvinfo,
> > - .get_link = ethtool_op_get_link,
> > - .get_ringparam = virtnet_get_ringparam,
> > -};
> > -
> >
> > #define MIN_MTU 68
> > #define MAX_MTU 65535
> >
> > @@ -984,6 +1046,20 @@ static int virtnet_change_mtu(struct net_device
> > *dev, int new_mtu)>
> > return 0;
> >
> > }
> >
> > +/* To avoid contending a lock hold by a vcpu who would exit to host,
> > select the + * txq based on the processor id.
> > + */
> > +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff
> > *skb) +{
> > + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> > + smp_processor_id();
> > +
> > + while (unlikely(txq >= dev->real_num_tx_queues))
> > + txq -= dev->real_num_tx_queues;
> > +
> > + return txq;
> > +}
> > +
> >
> > static const struct net_device_ops virtnet_netdev = {
> >
> > .ndo_open = virtnet_open,
> > .ndo_stop = virtnet_close,
> >
> > @@ -995,6 +1071,7 @@ static const struct net_device_ops virtnet_netdev = {
> >
> > .ndo_get_stats64 = virtnet_stats,
> > .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
> > .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> >
> > + .ndo_select_queue = virtnet_select_queue,
> >
> > #ifdef CONFIG_NET_POLL_CONTROLLER
> >
> > .ndo_poll_controller = virtnet_netpoll,
> >
> > #endif
> >
> > @@ -1030,10 +1107,10 @@ static void virtnet_config_changed_work(struct
> > work_struct *work)>
> > if (vi->status & VIRTIO_NET_S_LINK_UP) {
> >
> > netif_carrier_on(vi->dev);
> >
> > - netif_wake_queue(vi->dev);
> > + netif_tx_wake_all_queues(vi->dev);
> >
> > } else {
> >
> > netif_carrier_off(vi->dev);
> >
> > - netif_stop_queue(vi->dev);
> > + netif_tx_stop_all_queues(vi->dev);
> >
> > }
> >
> > done:
> > mutex_unlock(&vi->config_lock);
> >
> > @@ -1046,41 +1123,212 @@ static void virtnet_config_changed(struct
> > virtio_device *vdev)>
> > schedule_work(&vi->config_work);
> >
> > }
> >
> > -static int init_vqs(struct virtnet_info *vi)
> > +static void free_receive_bufs(struct virtnet_info *vi)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + while (vi->rq[i].pages)
> > + __free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
> > + }
> > +}
> > +
> > +/* Free memory allocated for send and receive queues */
> > +static void virtnet_free_queues(struct virtnet_info *vi)
> >
> > {
> >
> > - struct virtqueue *vqs[3];
> > - vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
> > - const char *names[] = { "input", "output", "control" };
> > - int nvqs, err;
> > + kfree(vi->rq);
> > + vi->rq = NULL;
> > + kfree(vi->sq);
> > + vi->sq = NULL;
> > +}
> >
> > - /* We expect two virtqueues, receive then send,
> > - * and optionally control. */
> > - nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
> > +static void free_unused_bufs(struct virtnet_info *vi)
> > +{
> > + void *buf;
> > + int i;
> >
> > - err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names);
> > - if (err)
> > - return err;
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + struct virtqueue *vq = vi->sq[i].vq;
> > + while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > + dev_kfree_skb(buf);
> > + }
> >
> > - vi->rq.vq = vqs[0];
> > - vi->sq.vq = vqs[1];
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + struct virtqueue *vq = vi->rq[i].vq;
> >
> > - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
> > - vi->cvq = vqs[2];
> > + while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > + if (vi->mergeable_rx_bufs || vi->big_packets)
> > + give_pages(&vi->rq[i], buf);
> > + else
> > + dev_kfree_skb(buf);
> > + --vi->rq[i].num;
> > + }
> > + BUG_ON(vi->rq[i].num != 0);
> > + }
> > +}
> >
> > +static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + int cpu = set ? i : -1;
> > + virtqueue_set_affinity(vi->rq[i].vq, cpu);
> > + virtqueue_set_affinity(vi->sq[i].vq, cpu);
> > + }
> > +}
> > +
> > +static void virtnet_del_vqs(struct virtnet_info *vi)
> > +{
> > + struct virtio_device *vdev = vi->vdev;
> > +
> > + virtnet_set_affinity(vi, false);
> > +
> > + vdev->config->del_vqs(vdev);
> > +
> > + virtnet_free_queues(vi);
> > +}
> > +
> > +static int virtnet_find_vqs(struct virtnet_info *vi)
> > +{
> > + vq_callback_t **callbacks;
> > + struct virtqueue **vqs;
> > + int ret = -ENOMEM;
> > + int i, total_vqs;
> > + char **names;
> > +
> > + /*
> > + * We expect 1 RX virtqueue followed by 1 TX virtqueue, followd by
> > + * possible control virtqueue, followed by RX/TX N-1 queue pairs used
> > + * in multiqueue mode.
> > + */
> > + total_vqs = vi->max_queue_pairs * 2 +
> > + virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
> > +
> > + /* Allocate space for find_vqs parameters */
> > + vqs = kzalloc(total_vqs * sizeof(*vqs), GFP_KERNEL);
> > + callbacks = kzalloc(total_vqs * sizeof(*callbacks), GFP_KERNEL);
> > + if (!vqs || !callbacks)
> > + goto err_mem;
> > + names = kzalloc(total_vqs * sizeof(*names), GFP_KERNEL);
> > + if (!names)
> > + goto err_mem;
> > +
> > + /* Parameters for control virtqueue, if any */
> > + if (vi->has_cvq) {
> > + callbacks[2] = NULL;
> > + names[2] = kasprintf(GFP_KERNEL, "control");
> > + }
> > +
> > + /* Allocate/initialize parameters for send/receive virtqueues */
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + callbacks[rxq2vq(i)] = skb_recv_done;
> > + callbacks[txq2vq(i)] = skb_xmit_done;
> > + names[rxq2vq(i)] = kasprintf(GFP_KERNEL, "input.%d", i);
> > + names[txq2vq(i)] = kasprintf(GFP_KERNEL, "output.%d", i);
> > + }
> > +
> > + ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
> > + (const char **)names);
> > + if (ret)
> > + goto err_names;
> > +
> > + if (vi->has_cvq) {
> > + vi->cvq = vqs[2];
> >
> > if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
> >
> > vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
> >
> > }
> >
> > +
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + vi->rq[i].vq = vqs[rxq2vq(i)];
> > + vi->sq[i].vq = vqs[txq2vq(i)];
> > + }
> > +
> > + kfree(callbacks);
> > + kfree(vqs);
> > +
> > + return 0;
> > +
> > +err_names:
> > + for (i = 0; i < total_vqs * 2; i ++)
> > + kfree(names[i]);
> > + kfree(names);
> > +
> > +err_mem:
> > + kfree(callbacks);
> > + kfree(vqs);
> > +
> > + return ret;
> > +}
> > +
> > +static int virtnet_alloc_queues(struct virtnet_info *vi)
> > +{
> > + int i;
> > +
> > + vi->sq = kzalloc(sizeof(vi->sq[0]) * vi->max_queue_pairs, GFP_KERNEL);
> > + vi->rq = kzalloc(sizeof(vi->rq[0]) * vi->max_queue_pairs, GFP_KERNEL);
> > + if (!vi->rq || !vi->sq)
> > + goto err;
> > +
> > + /* setup initial receive and send queue parameters */
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + vi->rq[i].pages = NULL;
> > + INIT_DELAYED_WORK(&vi->rq[i].refill, refill_work);
> > + netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
> > + napi_weight);
> > +
> > + sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
> > + sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
> > + }
> > +
> > +
> >
> > return 0;
> >
> > +
> > +err:
> > + virtnet_free_queues(vi);
> > + return -ENOMEM;
> > +}
> > +
> > +static int init_vqs(struct virtnet_info *vi)
> > +{
> > + int ret;
> > +
> > + /* Allocate send & receive queues */
> > + ret = virtnet_alloc_queues(vi);
> > + if (ret)
> > + goto err;
> > +
> > + ret = virtnet_find_vqs(vi);
> > + if (ret)
> > + goto err_free;
> > +
> > + virtnet_set_affinity(vi, true);
> > + return 0;
> > +
> > +err_free:
> > + virtnet_free_queues(vi);
> > +err:
> > + return ret;
> >
> > }
> >
> > static int virtnet_probe(struct virtio_device *vdev)
> > {
> >
> > - int err;
> > + int i, err;
> >
> > struct net_device *dev;
> > struct virtnet_info *vi;
> >
> > + u16 curr_queue_pairs;
>
> Probably a good idea to rename this max_queue_pairs.
Sure.
>
> > +
> > + /* Find if host supports multiqueue virtio_net device */
> > + err = virtio_config_val(vdev, VIRTIO_NET_F_RFS,
> > + offsetof(struct virtio_net_config,
> > + max_virtqueue_pairs), &curr_queue_pairs);
> > +
> > + /* We need at least 2 queue's */
> > + if (err)
> > + curr_queue_pairs = 1;
>
> Let's also validate against VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN
> and VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX.
Ok.
>
> > /* Allocate ourselves a network device with room for our info */
> >
> > - dev = alloc_etherdev(sizeof(struct virtnet_info));
> > + dev = alloc_etherdev_mq(sizeof(struct virtnet_info), curr_queue_pairs);
> >
> > if (!dev)
> >
> > return -ENOMEM;
> >
> > @@ -1126,22 +1374,17 @@ static int virtnet_probe(struct virtio_device
> > *vdev)>
> > /* Set up our device-specific information */
> > vi = netdev_priv(dev);
> >
> > - netif_napi_add(dev, &vi->rq.napi, virtnet_poll, napi_weight);
> >
> > vi->dev = dev;
> > vi->vdev = vdev;
> > vdev->priv = vi;
> >
> > - vi->rq.pages = NULL;
> >
> > vi->stats = alloc_percpu(struct virtnet_stats);
> > err = -ENOMEM;
> > if (vi->stats == NULL)
> >
> > goto free;
> >
> > - INIT_DELAYED_WORK(&vi->rq.refill, refill_work);
> >
> > mutex_init(&vi->config_lock);
> > vi->config_enable = true;
> > INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> >
> > - sg_init_table(vi->rq.sg, ARRAY_SIZE(vi->rq.sg));
> > - sg_init_table(vi->sq.sg, ARRAY_SIZE(vi->sq.sg));
> >
> > /* If we can receive ANY GSO packets, we must allocate large ones. */
> > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> >
> > @@ -1152,10 +1395,21 @@ static int virtnet_probe(struct virtio_device
> > *vdev)>
> > if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> >
> > vi->mergeable_rx_bufs = true;
> >
> > + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> > + vi->has_cvq = true;
> > +
> > + /* Use single tx/rx queue pair as default */
> > + vi->curr_queue_pairs = 1;
> > + vi->max_queue_pairs = curr_queue_pairs;
> > +
> > + /* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> >
> > err = init_vqs(vi);
> > if (err)
> >
> > goto free_stats;
> >
> > + netif_set_real_num_tx_queues(dev, 1);
> > + netif_set_real_num_rx_queues(dev, 1);
> > +
> >
> > err = register_netdev(dev);
> > if (err) {
> >
> > pr_debug("virtio_net: registering device failed\n");
> >
> > @@ -1163,12 +1417,15 @@ static int virtnet_probe(struct virtio_device
> > *vdev)>
> > }
> >
> > /* Last of all, set up some receive buffers. */
> >
> > - try_fill_recv(&vi->rq, GFP_KERNEL);
> > -
> > - /* If we didn't even get one input buffer, we're useless. */
> > - if (vi->rq.num == 0) {
> > - err = -ENOMEM;
> > - goto unregister;
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + try_fill_recv(&vi->rq[i], GFP_KERNEL);
> > +
> > + /* If we didn't even get one input buffer, we're useless. */
> > + if (vi->rq[i].num == 0) {
> > + free_unused_bufs(vi);
> > + err = -ENOMEM;
> > + goto free_recv_bufs;
> > + }
> >
> > }
> >
> > /* Assume link up if device can't report link status,
> >
> > @@ -1181,13 +1438,20 @@ static int virtnet_probe(struct virtio_device
> > *vdev)>
> > netif_carrier_on(dev);
> >
> > }
> >
> > - pr_debug("virtnet: registered device %s\n", dev->name);
> > + pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
> > + dev->name, curr_queue_pairs);
> > +
> >
> > return 0;
> >
> > -unregister:
> > +free_recv_bufs:
> > + free_receive_bufs(vi);
> >
> > unregister_netdev(dev);
> >
> > +
> >
> > free_vqs:
> > - vdev->config->del_vqs(vdev);
> > + for (i = 0; i <curr_queue_pairs; i++)
> > + cancel_delayed_work_sync(&vi->rq[i].refill);
> > + virtnet_del_vqs(vi);
> > +
> >
> > free_stats:
> > free_percpu(vi->stats);
> >
> > free:
> > @@ -1195,28 +1459,6 @@ free:
> > return err;
> >
> > }
> >
> > -static void free_unused_bufs(struct virtnet_info *vi)
> > -{
> > - void *buf;
> > - while (1) {
> > - buf = virtqueue_detach_unused_buf(vi->sq.vq);
> > - if (!buf)
> > - break;
> > - dev_kfree_skb(buf);
> > - }
> > - while (1) {
> > - buf = virtqueue_detach_unused_buf(vi->rq.vq);
> > - if (!buf)
> > - break;
> > - if (vi->mergeable_rx_bufs || vi->big_packets)
> > - give_pages(&vi->rq, buf);
> > - else
> > - dev_kfree_skb(buf);
> > - --vi->rq.num;
> > - }
> > - BUG_ON(vi->rq.num != 0);
> > -}
> > -
> >
> > static void remove_vq_common(struct virtnet_info *vi)
> > {
> >
> > vi->vdev->config->reset(vi->vdev);
> >
> > @@ -1224,10 +1466,9 @@ static void remove_vq_common(struct virtnet_info
> > *vi)>
> > /* Free unused buffers in both send and recv, if any. */
> > free_unused_bufs(vi);
> >
> > - vi->vdev->config->del_vqs(vi->vdev);
> > + free_receive_bufs(vi);
> >
> > - while (vi->rq.pages)
> > - __free_pages(get_a_page(&vi->rq, GFP_KERNEL), 0);
> > + virtnet_del_vqs(vi);
> >
> > }
> >
> > static void __devexit virtnet_remove(struct virtio_device *vdev)
> >
> > @@ -1253,6 +1494,7 @@ static void __devexit virtnet_remove(struct
> > virtio_device *vdev)>
> > static int virtnet_freeze(struct virtio_device *vdev)
> > {
> >
> > struct virtnet_info *vi = vdev->priv;
> >
> > + int i;
> >
> > /* Prevent config work handler from accessing the device */
> > mutex_lock(&vi->config_lock);
> >
> > @@ -1260,10 +1502,14 @@ static int virtnet_freeze(struct virtio_device
> > *vdev)>
> > mutex_unlock(&vi->config_lock);
> >
> > netif_device_detach(vi->dev);
> >
> > - cancel_delayed_work_sync(&vi->rq.refill);
> > + for (i = 0; i < vi->max_queue_pairs; i++)
> > + cancel_delayed_work_sync(&vi->rq[i].refill);
> >
> > if (netif_running(vi->dev))
> >
> > - napi_disable(&vi->rq.napi);
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + napi_disable(&vi->rq[i].napi);
> > + netif_napi_del(&vi->rq[i].napi);
> > + }
> >
> > remove_vq_common(vi);
> >
> > @@ -1275,24 +1521,28 @@ static int virtnet_freeze(struct virtio_device
> > *vdev)>
> > static int virtnet_restore(struct virtio_device *vdev)
> > {
> >
> > struct virtnet_info *vi = vdev->priv;
> >
> > - int err;
> > + int err, i;
> >
> > err = init_vqs(vi);
> > if (err)
> >
> > return err;
> >
> > if (netif_running(vi->dev))
> >
> > - virtnet_napi_enable(&vi->rq);
> > + for (i = 0; i < vi->max_queue_pairs; i++)
> > + virtnet_napi_enable(&vi->rq[i]);
> >
> > netif_device_attach(vi->dev);
> >
> > - if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> > - schedule_delayed_work(&vi->rq.refill, 0);
> > + for (i = 0; i < vi->max_queue_pairs; i++)
> > + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> > + schedule_delayed_work(&vi->rq[i].refill, 0);
> >
> > mutex_lock(&vi->config_lock);
> > vi->config_enable = true;
> > mutex_unlock(&vi->config_lock);
> >
> > + BUG_ON(virtnet_set_queues(vi));
> > +
>
> Won't this always fail when control vq is off?
Yes, will add a check of VIRTIO_NET_F_RFS before calling virtnet_set_queues().
>
> > return 0;
> >
> > }
> > #endif
> >
> > @@ -1310,7 +1560,7 @@ static unsigned int features[] = {
> >
> > VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
> > VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
> > VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
> >
> > - VIRTIO_NET_F_GUEST_ANNOUNCE,
> > + VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_RFS,
> >
> > };
> >
> > static struct virtio_driver virtio_net_driver = {
> >
> > @@ -1328,6 +1578,12 @@ static struct virtio_driver virtio_net_driver = {
> >
> > #endif
> > };
> >
> > +static const struct ethtool_ops virtnet_ethtool_ops = {
> > + .get_drvinfo = virtnet_get_drvinfo,
> > + .get_link = ethtool_op_get_link,
> > + .get_ringparam = virtnet_get_ringparam,
> > +};
> > +
> >
> > static int __init init(void)
> > {
> >
> > return register_virtio_driver(&virtio_net_driver);
> >
> > diff --git a/include/uapi/linux/virtio_net.h
> > b/include/uapi/linux/virtio_net.h index 2470f54..6056cec 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -51,6 +51,7 @@
> >
> > #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support
*/
> > #define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on
> > the
> >
> > * network */
> >
> > +#define VIRTIO_NET_F_RFS 22 /* Device supports multiple TXQ/RXQ */
> >
> > #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
> > #define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
> >
> > @@ -60,6 +61,8 @@ struct virtio_net_config {
> >
> > __u8 mac[6];
> > /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> > __u16 status;
> >
> > + /* Total number of RX/TX queues */
> > + __u16 max_virtqueue_pairs;
> >
> > } __attribute__((packed));
> >
> > /* This is the first element of the scatter-gather list. If you don't
> >
> > @@ -166,4 +169,17 @@ struct virtio_net_ctrl_mac {
> >
> > #define VIRTIO_NET_CTRL_ANNOUNCE 3
> >
> > #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
> >
> > +/*
> > + * Control multiqueue
> > + *
> > + */
> > +struct virtio_net_ctrl_rfs {
> > + u16 virtqueue_pairs;
> > +};
> > +
> > +#define VIRTIO_NET_CTRL_RFS 4
> > + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET 0
> > + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN 1
> > + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX 0x8000
> > +
> >
> > #endif /* _LINUX_VIRTIO_NET_H */
^ permalink raw reply
* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Jason Wang @ 2012-12-03 6:05 UTC (permalink / raw)
To: Rusty Russell
Cc: krkumar2, kvm, mst, netdev, linux-kernel, virtualization,
bhutchings, jwhan, shiyer
In-Reply-To: <87vccjj3hj.fsf@rustcorp.com.au>
On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote:
> Jason Wang <jasowang@redhat.com> writes:
> > +static const struct ethtool_ops virtnet_ethtool_ops;
> > +
> > +/*
> > + * Converting between virtqueue no. and kernel tx/rx queue no.
> > + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> > + */
> > +static int vq2txq(struct virtqueue *vq)
> > +{
> > + int index = virtqueue_get_queue_index(vq);
> > + return index == 1 ? 0 : (index - 2) / 2;
> > +}
> > +
> > +static int txq2vq(int txq)
> > +{
> > + return txq ? 2 * txq + 2 : 1;
> > +}
> > +
> > +static int vq2rxq(struct virtqueue *vq)
> > +{
> > + int index = virtqueue_get_queue_index(vq);
> > + return index ? (index - 1) / 2 : 0;
> > +}
> > +
> > +static int rxq2vq(int rxq)
> > +{
> > + return rxq ? 2 * rxq + 1 : 0;
> > +}
> > +
>
> I thought MST changed the proposed spec to make the control queue always
> the last one, so this logic becomes trivial.
But it may break the support of legacy guest. If we boot a legacy single queue
guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is indeed
rx1.
>
> > +static int virtnet_set_queues(struct virtnet_info *vi)
> > +{
> > + struct scatterlist sg;
> > + struct virtio_net_ctrl_rfs s;
> > + struct net_device *dev = vi->dev;
> > +
> > + s.virtqueue_pairs = vi->curr_queue_pairs;
> > + sg_init_one(&sg, &s, sizeof(s));
> > +
> > + if (!vi->has_cvq)
> > + return -EINVAL;
> > +
> > + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RFS,
> > + VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET, &sg, 1, 0)){
> > + dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
> > + " %d\n", vi->curr_queue_pairs);
> > + return -EINVAL;
> > + }
>
> Where do we check the VIRTIO_NET_F_RFS bit?
Yes, we need this check. Will let the caller does the check and add a comment
and check in the caller.
>
> > static int virtnet_probe(struct virtio_device *vdev)
> > {
> >
> > - int err;
> > + int i, err;
> >
> > struct net_device *dev;
> > struct virtnet_info *vi;
> >
> > + u16 curr_queue_pairs;
> > +
> > + /* Find if host supports multiqueue virtio_net device */
> > + err = virtio_config_val(vdev, VIRTIO_NET_F_RFS,
> > + offsetof(struct virtio_net_config,
> > + max_virtqueue_pairs), &curr_queue_pairs);
> > +
> > + /* We need at least 2 queue's */
> > + if (err)
> > + curr_queue_pairs = 1;
>
> Huh? Just call this queue_pairs. It's not curr_ at all...
>
> > + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> > + vi->has_cvq = true;
> > +
> > + /* Use single tx/rx queue pair as default */
> > + vi->curr_queue_pairs = 1;
> > + vi->max_queue_pairs = curr_queue_pairs;
>
> See...
Right, will use max_queue_pairs then.
Thanks
>
> Cheers,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [net-next rfc v7 3/3] virtio-net: change the number of queues through ethtool
From: Jason Wang @ 2012-12-03 6:09 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
jwhan, shiyer
In-Reply-To: <20121202160906.GB27761@redhat.com>
On Sunday, December 02, 2012 06:09:06 PM Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2012 at 06:16:00PM +0800, Jason Wang wrote:
> > This patch implement the {set|get}_channels method of ethool to allow user
> > to change the number of queues dymaically when the device is running.
> > This would let the user to configure it on demand.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >
> > drivers/net/virtio_net.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > 1 files changed, 41 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index bcaa6e5..f08ec2a 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1578,10 +1578,51 @@ static struct virtio_driver virtio_net_driver = {
> >
> > #endif
> > };
> >
> > +/* TODO: Eliminate OOO packets during switching */
> > +static int virtnet_set_channels(struct net_device *dev,
> > + struct ethtool_channels *channels)
> > +{
> > + struct virtnet_info *vi = netdev_priv(dev);
> > + u16 queue_pairs = channels->combined_count;
> > +
> > + /* We don't support separate rx/tx channels.
> > + * We don't allow setting 'other' channels.
> > + */
> > + if (channels->rx_count || channels->tx_count || channels->other_count)
> > + return -EINVAL;
> > +
> > + /* Only two modes were support currently */
> > + if (queue_pairs != vi->max_queue_pairs && queue_pairs != 1)
> > + return -EINVAL;
> > +
>
> Why the limitation?
Not sure the value bettwen 1 and max_queue_pairs is useful. But anyway, I can
remove this limitation.
> Also how does userspace discover what the legal values are?
Userspace only check whether the value is greater than max_queue_pairs.
>
> > + vi->curr_queue_pairs = queue_pairs;
> > + BUG_ON(virtnet_set_queues(vi));
> > +
> > + netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> > + netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> > +
> > + return 0;
> > +}
> > +
> > +static void virtnet_get_channels(struct net_device *dev,
> > + struct ethtool_channels *channels)
> > +{
> > + struct virtnet_info *vi = netdev_priv(dev);
> > +
> > + channels->combined_count = vi->curr_queue_pairs;
> > + channels->max_combined = vi->max_queue_pairs;
> > + channels->max_other = 0;
> > + channels->rx_count = 0;
> > + channels->tx_count = 0;
> > + channels->other_count = 0;
> > +}
> > +
> >
> > static const struct ethtool_ops virtnet_ethtool_ops = {
> >
> > .get_drvinfo = virtnet_get_drvinfo,
> > .get_link = ethtool_op_get_link,
> > .get_ringparam = virtnet_get_ringparam,
> >
> > + .set_channels = virtnet_set_channels,
> > + .get_channels = virtnet_get_channels,
> >
> > };
> >
> > static int __init init(void)
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [net-next RFC] pktgen: don't wait for the device who doesn't free skb immediately after sent
From: Jason Wang @ 2012-12-03 6:45 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: rusty, mst, davem, netdev, linux-kernel, virtualization
In-Reply-To: <20121127084919.1587c647@nehalam.linuxnetplumber.net>
On Tuesday, November 27, 2012 08:49:19 AM Stephen Hemminger wrote:
> On Tue, 27 Nov 2012 14:45:13 +0800
>
> Jason Wang <jasowang@redhat.com> wrote:
> > On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > On Mon, 26 Nov 2012 15:56:52 +0800
> > >
> > > Jason Wang <jasowang@redhat.com> wrote:
> > >> Some deivces do not free the old tx skbs immediately after it has been
> > >> sent
> > >> (usually in tx interrupt). One such example is virtio-net which
> > >> optimizes for virt and only free the possible old tx skbs during the
> > >> next packet sending. This would lead the pktgen to wait forever in the
> > >> refcount of the skb if no other pakcet will be sent afterwards.
> > >>
> > >> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > >> which could notify the pktgen that the device does not free skb
> > >> immediately after it has been sent and let it not to wait for the
> > >> refcount to be one.
> > >>
> > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > > Another alternative would be using skb_orphan() and skb->destructor.
> > > There are other cases where skb's are not freed right away.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > Hi Stephen:
> >
> > Do you mean registering a skb->destructor for pktgen then set and check
> > bits in skb->tx_flag?
>
> Yes. Register a destructor that does something like update a counter (number
> of packets pending), then just spin while number of packets pending is over
> threshold.
Have some experiments on this, looks like it does not work weel when clone_skb
is used. For driver that call skb_orphan() in ndo_start_xmit, the destructor
is only called when the first packet were sent, but what we need to know is
when the last were sent. Any thoughts on this or we can just introduce another
flag (anyway we have something like IFF_TX_SKB_SHARING) ?
Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* Re: [PATCH V2 1/5] cxgb4: Add T4 filter support
From: Vipul Pandya @ 2012-12-03 6:48 UTC (permalink / raw)
To: David Miller
Cc: roland@purestorage.com, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, Divy Le Ray, Dimitrios Michailidis,
Kumar A S, Steve Wise, Abhishek Agrawal
In-Reply-To: <20121130.222441.428094549161714937.davem@davemloft.net>
On 01-12-2012 08:54, David Miller wrote:
> From: Roland Dreier <roland@purestorage.com>
> Date: Fri, 30 Nov 2012 17:43:27 -0800
>
>> On Fri, Nov 30, 2012 at 8:56 AM, David Miller <davem@davemloft.net> wrote:
>>> I really don't understand how we're supposed to review your patches
>>> when you only post some parts of the patch series to netdev, and
>>> others not.
>>
>> I think he just didn't repost the patches that were unchanged from the
>> first time around.
>
> Well, he needs to.
>
I was ignorant about this. Sorry for the inconvenience caused. I will
resubmit the whole series again.
Thanks,
Vipul
^ permalink raw reply
* Re: [Suggestion] net/atm : for sprintf, need check the total write length whether larger than a page.
From: Chen Gang @ 2012-12-03 8:56 UTC (permalink / raw)
To: David.Woodhouse, David Miller, krzysiek, Joe Perches, edumazet,
netdev
In-Reply-To: <50AC58BC.1020004@asianux.com>
Hello Maintainers:
was this suggestion replied ? (it seems not).
and please help to check whether this suggestion is valid.
thanks.
gchen.
于 2012年11月21日 12:29, Chen Gang 写道:
> Hello David Miller:
>
> in net/atm/atm_sysfs.c:
> suggest to check the write length whether larger than a page.
> the length of parameter buf is one page size (reference: fill_read_buffer at fs/sysfs/file.c)
> and the count of atm adresses are not limited (reference: atm_dev_ioctl -> atm_add_addr)
>
> thanks.
>
> gchen.
>
> 34 static ssize_t show_atmaddress(struct device *cdev,
> 35 struct device_attribute *attr, char *buf)
> 36 {
> 37 unsigned long flags;
> 38 char *pos = buf;
> 39 struct atm_dev *adev = to_atm_dev(cdev);
> 40 struct atm_dev_addr *aaddr;
> 41 int bin[] = { 1, 2, 10, 6, 1 }, *fmt = bin;
> 42 int i, j;
> 43
> 44 spin_lock_irqsave(&adev->lock, flags);
> 45 list_for_each_entry(aaddr, &adev->local, entry) {
> 46 for (i = 0, j = 0; i < ATM_ESA_LEN; ++i, ++j) {
> 47 if (j == *fmt) {
> 48 pos += sprintf(pos, ".");
> 49 ++fmt;
> 50 j = 0;
> 51 }
> 52 pos += sprintf(pos, "%02x",
> 53 aaddr->addr.sas_addr.prv[i]);
> 54 }
> 55 pos += sprintf(pos, "\n");
> 56 }
> 57 spin_unlock_irqrestore(&adev->lock, flags);
> 58
> 59 return pos - buf;
> 60 }
> 61
>
>
>
> in net/atm/addr.c
>
> 67 int atm_add_addr(struct atm_dev *dev, const struct sockaddr_atmsvc *addr,
> 68 enum atm_addr_type_t atype)
> 69 {
> 70 unsigned long flags;
> 71 struct atm_dev_addr *this;
> 72 struct list_head *head;
> 73 int error;
> 74
> 75 error = check_addr(addr);
> 76 if (error)
> 77 return error;
> 78 spin_lock_irqsave(&dev->lock, flags);
> 79 if (atype == ATM_ADDR_LECS)
> 80 head = &dev->lecs;
> 81 else
> 82 head = &dev->local;
> 83 list_for_each_entry(this, head, entry) {
> 84 if (identical(&this->addr, addr)) {
> 85 spin_unlock_irqrestore(&dev->lock, flags);
> 86 return -EEXIST;
> 87 }
> 88 }
> 89 this = kmalloc(sizeof(struct atm_dev_addr), GFP_ATOMIC);
> 90 if (!this) {
> 91 spin_unlock_irqrestore(&dev->lock, flags);
> 92 return -ENOMEM;
> 93 }
> 94 this->addr = *addr;
> 95 list_add(&this->entry, head);
> 96 spin_unlock_irqrestore(&dev->lock, flags);
> 97 if (head == &dev->local)
> 98 notify_sigd(dev);
> 99 return 0;
> 100 }
> 101
>
>
> in net/atm/resources.c
>
> 195 int atm_dev_ioctl(unsigned int cmd, void __user *arg, int compat)
> 196 {
> 197 void __user *buf;
> 198 int error, len, number, size = 0;
> 199 struct atm_dev *dev;
> 200 struct list_head *p;
> 201 int *tmp_buf, *tmp_p;
> 202 int __user *sioc_len;
> 203 int __user *iobuf_len;
> 204
> 205 #ifndef CONFIG_COMPAT
> 206 compat = 0; /* Just so the compiler _knows_ */
> 207 #endif
> 208
> 209 switch (cmd) {
> 210 case ATM_GETNAMES:
> 211 if (compat) {
> 212 #ifdef CONFIG_COMPAT
> 213 struct compat_atm_iobuf __user *ciobuf = arg;
> 214 compat_uptr_t cbuf;
> 215 iobuf_len = &ciobuf->length;
> 216 if (get_user(cbuf, &ciobuf->buffer))
> 217 return -EFAULT;
> 218 buf = compat_ptr(cbuf);
> 219 #endif
> 220 } else {
> 221 struct atm_iobuf __user *iobuf = arg;
> 222 iobuf_len = &iobuf->length;
> 223 if (get_user(buf, &iobuf->buffer))
> 224 return -EFAULT;
> 225 }
> 226 if (get_user(len, iobuf_len))
> 227 return -EFAULT;
> 228 mutex_lock(&atm_dev_mutex);
> 229 list_for_each(p, &atm_devs)
> 230 size += sizeof(int);
> 231 if (size > len) {
> 232 mutex_unlock(&atm_dev_mutex);
> 233 return -E2BIG;
> 234 }
> 235 tmp_buf = kmalloc(size, GFP_ATOMIC);
> 236 if (!tmp_buf) {
> 237 mutex_unlock(&atm_dev_mutex);
> 238 return -ENOMEM;
> 239 }
> 240 tmp_p = tmp_buf;
> 241 list_for_each(p, &atm_devs) {
> 242 dev = list_entry(p, struct atm_dev, dev_list);
> 243 *tmp_p++ = dev->number;
> 244 }
> 245 mutex_unlock(&atm_dev_mutex);
> 246 error = ((copy_to_user(buf, tmp_buf, size)) ||
> 247 put_user(size, iobuf_len))
> 248 ? -EFAULT : 0;
> 249 kfree(tmp_buf);
> 250 return error;
> 251 default:
> 252 break;
> 253 }
> 254
> 255 if (compat) {
> 256 #ifdef CONFIG_COMPAT
> 257 struct compat_atmif_sioc __user *csioc = arg;
> 258 compat_uptr_t carg;
> 259
> 260 sioc_len = &csioc->length;
> 261 if (get_user(carg, &csioc->arg))
> 262 return -EFAULT;
> 263 buf = compat_ptr(carg);
> 264
> 265 if (get_user(len, &csioc->length))
> 266 return -EFAULT;
> 267 if (get_user(number, &csioc->number))
> 268 return -EFAULT;
> 269 #endif
> 270 } else {
> 271 struct atmif_sioc __user *sioc = arg;
> 272
> 273 sioc_len = &sioc->length;
> 274 if (get_user(buf, &sioc->arg))
> 275 return -EFAULT;
> 276 if (get_user(len, &sioc->length))
> 277 return -EFAULT;
> 278 if (get_user(number, &sioc->number))
> 279 return -EFAULT;
> 280 }
> 281
> 282 dev = try_then_request_module(atm_dev_lookup(number), "atm-device-%d",
> 283 number);
> 284 if (!dev)
> 285 return -ENODEV;
> 286
> 287 switch (cmd) {
> 288 case ATM_GETTYPE:
> 289 size = strlen(dev->type) + 1;
> 290 if (copy_to_user(buf, dev->type, size)) {
> 291 error = -EFAULT;
> 292 goto done;
> 293 }
> 294 break;
> 295 case ATM_GETESI:
> 296 size = ESI_LEN;
> 297 if (copy_to_user(buf, dev->esi, size)) {
> 298 error = -EFAULT;
> 299 goto done;
> 300 }
> 301 break;
> 302 case ATM_SETESI:
> 303 {
> 304 int i;
> 305
> 306 for (i = 0; i < ESI_LEN; i++)
> 307 if (dev->esi[i]) {
> 308 error = -EEXIST;
> 309 goto done;
> 310 }
> 311 }
> 312 /* fall through */
> 313 case ATM_SETESIF:
> 314 {
> 315 unsigned char esi[ESI_LEN];
> 316
> 317 if (!capable(CAP_NET_ADMIN)) {
> 318 error = -EPERM;
> 319 goto done;
> 320 }
> 321 if (copy_from_user(esi, buf, ESI_LEN)) {
> 322 error = -EFAULT;
> 323 goto done;
> 324 }
> 325 memcpy(dev->esi, esi, ESI_LEN);
> 326 error = ESI_LEN;
> 327 goto done;
> 328 }
> 329 case ATM_GETSTATZ:
> 330 if (!capable(CAP_NET_ADMIN)) {
> 331 error = -EPERM;
> 332 goto done;
> 333 }
> 334 /* fall through */
> 335 case ATM_GETSTAT:
> 336 size = sizeof(struct atm_dev_stats);
> 337 error = fetch_stats(dev, buf, cmd == ATM_GETSTATZ);
> 338 if (error)
> 339 goto done;
> 340 break;
> 341 case ATM_GETCIRANGE:
> 342 size = sizeof(struct atm_cirange);
> 343 if (copy_to_user(buf, &dev->ci_range, size)) {
> 344 error = -EFAULT;
> 345 goto done;
> 346 }
> 347 break;
> 348 case ATM_GETLINKRATE:
> 349 size = sizeof(int);
> 350 if (copy_to_user(buf, &dev->link_rate, size)) {
> 351 error = -EFAULT;
> 352 goto done;
> 353 }
> 354 break;
> 355 case ATM_RSTADDR:
> 356 if (!capable(CAP_NET_ADMIN)) {
> 357 error = -EPERM;
> 358 goto done;
> 359 }
> 360 atm_reset_addr(dev, ATM_ADDR_LOCAL);
> 361 break;
> 362 case ATM_ADDADDR:
> 363 case ATM_DELADDR:
> 364 case ATM_ADDLECSADDR:
> 365 case ATM_DELLECSADDR:
> 366 {
> 367 struct sockaddr_atmsvc addr;
> 368
> 369 if (!capable(CAP_NET_ADMIN)) {
> 370 error = -EPERM;
> 371 goto done;
> 372 }
> 373
> 374 if (copy_from_user(&addr, buf, sizeof(addr))) {
> 375 error = -EFAULT;
> 376 goto done;
> 377 }
> 378 if (cmd == ATM_ADDADDR || cmd == ATM_ADDLECSADDR)
> 379 error = atm_add_addr(dev, &addr,
> 380 (cmd == ATM_ADDADDR ?
> 381 ATM_ADDR_LOCAL : ATM_ADDR_LECS));
> 382 else
> 383 error = atm_del_addr(dev, &addr,
> 384 (cmd == ATM_DELADDR ?
> 385 ATM_ADDR_LOCAL : ATM_ADDR_LECS));
> 386 goto done;
> 387 }
> ... ...
> ... ...
>
>
>
--
Chen Gang
Asianux Corporation
^ permalink raw reply
* re: netdevice wanrouter: Convert directly reference of netdev->priv
From: Dan Carpenter @ 2012-12-03 9:04 UTC (permalink / raw)
To: wangchen; +Cc: netdev
Hello Wang Chen,
The patch 7be6065b39c3: "netdevice wanrouter: Convert directly
reference of netdev->priv" from Nov 20, 2008, leads to the following
Smatch warning:
net/wanrouter/wanmain.c:610 wanrouter_device_new_if()
error: potential NULL dereference 'dev'.
This is an old patch from 2008. It removed the allocation in
wanrouter_device_new_if() so it looks like wanrouter has been completely
broken for four years.
@@ -589,10 +591,6 @@ static int wanrouter_device_new_if(struct wan_device *wandev,
err = -EPROTONOSUPPORT;
goto out;
} else {
- dev = kzalloc(sizeof(struct net_device), GFP_KERNEL);
- err = -ENOBUFS;
- if (dev == NULL)
- goto out;
err = wandev->new_if(wandev, dev, cnf);
"dev" is still NULL after the call to ->new_if().
}
Here is what the code looks like now:
net/wanrouter/wanmain.c
590 if (cnf->config_id == WANCONFIG_MPPP) {
591 printk(KERN_INFO "%s: Wanpipe Mulit-Port PPP support has not been compiled in!\n",
592 wandev->name);
593 err = -EPROTONOSUPPORT;
594 goto out;
595 } else {
We were supposed to allocate "dev" here.
596 err = wandev->new_if(wandev, dev, cnf);
597 }
598
599 if (!err) {
600 /* Register network interface. This will invoke init()
601 * function supplied by the driver. If device registered
602 * successfully, add it to the interface list.
603 */
604
605 #ifdef WANDEBUG
606 printk(KERN_INFO "%s: registering interface %s...\n",
607 wanrouter_modname, dev->name);
608 #endif
609
610 err = register_netdev(dev);
^^^^^^^^^^^^^^^^^^^^
The kernel will always oops inside the call to register_netdev() because
"dev" is still NULL.
I suspect we should just revert the patch?
regards,
dan carpenter
^ permalink raw reply
* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Michael S. Tsirkin @ 2012-12-03 9:47 UTC (permalink / raw)
To: Jason Wang
Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
jwhan, shiyer
In-Reply-To: <20845723.CY8SZ4xV0F@jason-thinkpad-t430s>
On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote:
> On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote:
> > Jason Wang <jasowang@redhat.com> writes:
> > > +static const struct ethtool_ops virtnet_ethtool_ops;
> > > +
> > > +/*
> > > + * Converting between virtqueue no. and kernel tx/rx queue no.
> > > + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> > > + */
> > > +static int vq2txq(struct virtqueue *vq)
> > > +{
> > > + int index = virtqueue_get_queue_index(vq);
> > > + return index == 1 ? 0 : (index - 2) / 2;
> > > +}
> > > +
> > > +static int txq2vq(int txq)
> > > +{
> > > + return txq ? 2 * txq + 2 : 1;
> > > +}
> > > +
> > > +static int vq2rxq(struct virtqueue *vq)
> > > +{
> > > + int index = virtqueue_get_queue_index(vq);
> > > + return index ? (index - 1) / 2 : 0;
> > > +}
> > > +
> > > +static int rxq2vq(int rxq)
> > > +{
> > > + return rxq ? 2 * rxq + 1 : 0;
> > > +}
> > > +
> >
> > I thought MST changed the proposed spec to make the control queue always
> > the last one, so this logic becomes trivial.
>
> But it may break the support of legacy guest. If we boot a legacy single queue
> guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is indeed
> rx1.
Legacy guyest support should be handled by host using feature
bits in the usual way: host should detect legacy guest
by checking the VIRTIO_NET_F_RFS feature.
If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2.
If it's not acked, cvq is vq 2.
^ permalink raw reply
* [PATCH 1/1] net: mvneta: Remove unneeded version.h header inclusion
From: Sachin Kamat @ 2012-12-03 9:45 UTC (permalink / raw)
To: netdev; +Cc: thomas.petazzoni, sachin.kamat, patches
linux/version.h inclusion is not necessary.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
---
drivers/net/ethernet/marvell/mvneta.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 3f8086b..a1e0e9f 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -12,7 +12,6 @@
*/
#include <linux/kernel.h>
-#include <linux/version.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/platform_device.h>
--
1.7.4.1
^ permalink raw reply related
* [PATCH net 1/1] bnx2x: recognize fan failure
From: Yuval Mintz @ 2012-12-03 9:56 UTC (permalink / raw)
To: davem, netdev; +Cc: ariele, Yaniv Rosner, Yuval Mintz, Eilon Greenstein
From: Yaniv Rosner <yaniv.rosner@broadcom.com>
If fan failure is detected, MCP prevents PCI I/O registers from being
mapped to the bar, causing a fatal error as driver is unaware.
This patch recognizes such an event occurred and gracefully terminates
the probe process.
Signed-off-by: Yaniv Rosner <yaniv.rosner@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
Hi Dave,
This patch prevents a fatal error on newer bnx2x boards.
Please consider applying it to 'net'.
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 8 ++++++++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h | 4 +++-
2 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 01611b3..101392b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -11346,6 +11346,14 @@ static int __devinit bnx2x_init_dev(struct pci_dev *pdev,
goto err_out_disable;
}
+ pci_read_config_dword(pdev, PCICFG_REVISION_ID_OFFSET, &pci_cfg_dword);
+ if ((pci_cfg_dword & PCICFG_REVESION_ID_MASK) ==
+ PCICFG_REVESION_ID_ERROR_VAL) {
+ pr_err("PCI device error, probably due to fan failure, aborting\n");
+ rc = -ENODEV;
+ goto err_out_disable;
+ }
+
if (atomic_read(&pdev->enable_cnt) == 1) {
rc = pci_request_regions(pdev, DRV_MODULE_NAME);
if (rc) {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
index 1b1999d..698cbf7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
@@ -6130,7 +6130,9 @@
#define PCICFG_COMMAND_INT_DISABLE (1<<10)
#define PCICFG_COMMAND_RESERVED (0x1f<<11)
#define PCICFG_STATUS_OFFSET 0x06
-#define PCICFG_REVESION_ID_OFFSET 0x08
+#define PCICFG_REVISION_ID_OFFSET 0x08
+#define PCICFG_REVESION_ID_MASK 0xff
+#define PCICFG_REVESION_ID_ERROR_VAL 0xff
#define PCICFG_CACHE_LINE_SIZE 0x0c
#define PCICFG_LATENCY_TIMER 0x0d
#define PCICFG_BAR_1_LOW 0x10
--
1.7.1
^ permalink raw reply related
* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Jason Wang @ 2012-12-03 10:01 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
jwhan, shiyer
In-Reply-To: <20121203094735.GA23009@redhat.com>
On 12/03/2012 05:47 PM, Michael S. Tsirkin wrote:
> On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote:
>> On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote:
>>> Jason Wang <jasowang@redhat.com> writes:
>>>> +static const struct ethtool_ops virtnet_ethtool_ops;
>>>> +
>>>> +/*
>>>> + * Converting between virtqueue no. and kernel tx/rx queue no.
>>>> + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
>>>> + */
>>>> +static int vq2txq(struct virtqueue *vq)
>>>> +{
>>>> + int index = virtqueue_get_queue_index(vq);
>>>> + return index == 1 ? 0 : (index - 2) / 2;
>>>> +}
>>>> +
>>>> +static int txq2vq(int txq)
>>>> +{
>>>> + return txq ? 2 * txq + 2 : 1;
>>>> +}
>>>> +
>>>> +static int vq2rxq(struct virtqueue *vq)
>>>> +{
>>>> + int index = virtqueue_get_queue_index(vq);
>>>> + return index ? (index - 1) / 2 : 0;
>>>> +}
>>>> +
>>>> +static int rxq2vq(int rxq)
>>>> +{
>>>> + return rxq ? 2 * rxq + 1 : 0;
>>>> +}
>>>> +
>>> I thought MST changed the proposed spec to make the control queue always
>>> the last one, so this logic becomes trivial.
>> But it may break the support of legacy guest. If we boot a legacy single queue
>> guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is indeed
>> rx1.
> Legacy guyest support should be handled by host using feature
> bits in the usual way: host should detect legacy guest
> by checking the VIRTIO_NET_F_RFS feature.
>
> If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2.
> If it's not acked, cvq is vq 2.
>
We could, but we didn't gain much from this. Furthermore, we need also do the dynamic creation/destroying of virtqueues during feature negotiation which seems not supported in qemu now.
^ permalink raw reply
* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Michael S. Tsirkin @ 2012-12-03 10:14 UTC (permalink / raw)
To: Jason Wang
Cc: rusty, krkumar2, virtualization, netdev, linux-kernel, kvm,
bhutchings, jwhan, shiyer
In-Reply-To: <1354011360-39479-3-git-send-email-jasowang@redhat.com>
On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
> - if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> - schedule_delayed_work(&vi->rq.refill, 0);
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> + schedule_delayed_work(&vi->rq[i].refill, 0);
>
> mutex_lock(&vi->config_lock);
> vi->config_enable = true;
> mutex_unlock(&vi->config_lock);
>
> + BUG_ON(virtnet_set_queues(vi));
> +
> return 0;
> }
> #endif
Also crashing on device nack of command is also not nice.
In this case it seems we can just switch to
single-queue mode which should always be safe.
^ permalink raw reply
* Re: [RFC PATCH 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-03 10:15 UTC (permalink / raw)
To: Paul Moore; +Cc: netdev, linux-security-module, selinux
In-Reply-To: <20121129220637.30020.9980.stgit@sifl>
On 11/30/2012 06:06 AM, Paul Moore wrote:
> This patch corrects some problems with LSM/SELinux that were introduced
> with the multiqueue patchset. The problem stems from the fact that the
> multiqueue work changed the relationship between the tun device and its
> associated socket; before the socket persisted for the life of the
> device, however after the multiqueue changes the socket only persisted
> for the life of the userspace connection (fd open). For non-persistent
> devices this is not an issue, but for persistent devices this can cause
> the tun device to lose its SELinux label.
>
> We correct this problem by adding an opaque LSM security blob to the
> tun device struct which allows us to have the LSM security state, e.g.
> SELinux labeling information, persist for the lifetime of the tun
> device.
Hi Paul, thanks for the patchset. I've one question, see below.
>
> Signed-off-by: Paul Moore <pmoore@redhat.com>
> ---
> drivers/net/tun.c | 13 ++++++++---
> include/linux/security.h | 37 ++++++++++++++++++++++-----------
> security/capability.c | 14 +++++++++---
> security/security.c | 22 ++++++++++++-------
> security/selinux/hooks.c | 42 +++++++++++++++++++++++--------------
> security/selinux/include/objsec.h | 4 ++++
> 6 files changed, 88 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 877ffe2..85cc924 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -182,6 +182,7 @@ struct tun_struct {
> struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
> struct timer_list flow_gc_timer;
> unsigned long ageing_time;
> + void *security;
> };
>
> static inline u32 tun_hashfn(u32 rxhash)
> @@ -465,6 +466,10 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> struct tun_file *tfile = file->private_data;
> int err;
>
> + err = security_tun_dev_attach(tfile->socket.sk, tun->security);
> + if (err < 0)
> + goto out;
> +
> err = -EINVAL;
> if (rcu_dereference_protected(tfile->tun, lockdep_rtnl_is_held()))
> goto out;
> @@ -1365,6 +1370,7 @@ static void tun_free_netdev(struct net_device *dev)
> struct tun_struct *tun = netdev_priv(dev);
>
> tun_flow_uninit(tun);
> + security_tun_dev_free_security(tun->security);
> free_netdev(dev);
> }
>
> @@ -1548,9 +1554,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>
> if (tun_not_capable(tun))
> return -EPERM;
> - err = security_tun_dev_attach(tfile->socket.sk);
> - if (err < 0)
> - return err;
>
> err = tun_attach(tun, file);
> if (err < 0)
> @@ -1601,7 +1604,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>
> spin_lock_init(&tun->lock);
>
> - security_tun_dev_post_create(&tfile->sk);
> + err = security_tun_dev_alloc_security(&tun->security);
> + if (err < 0)
> + goto err_free_dev;
>
> tun_net_init(dev);
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 05e88bd..260e151 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -983,17 +983,23 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
> * tells the LSM to decrement the number of secmark labeling rules loaded
> * @req_classify_flow:
> * Sets the flow's sid to the openreq sid.
> + * @tun_dev_alloc_security:
> + * This hook allows a module to allocate a security structure for a TUN
> + * device.
> + * @security pointer to a security structure pointer.
> + * Returns a zero on success, negative values on failure.
> + * @tun_dev_free_security:
> + * This hook allows a module to free the security structure for a TUN
> + * device.
> + * @security pointer to the TUN device's security structure
> * @tun_dev_create:
> * Check permissions prior to creating a new TUN device.
> - * @tun_dev_post_create:
> - * This hook allows a module to update or allocate a per-socket security
> - * structure.
> - * @sk contains the newly created sock structure.
> * @tun_dev_attach:
> * Check permissions prior to attaching to a persistent TUN device. This
> * hook can also be used by the module to update any security state
> * associated with the TUN device's sock structure.
> * @sk contains the existing sock structure.
> + * @security pointer to the TUN device's security structure.
> *
> * Security hooks for XFRM operations.
> *
> @@ -1613,9 +1619,10 @@ struct security_operations {
> void (*secmark_refcount_inc) (void);
> void (*secmark_refcount_dec) (void);
> void (*req_classify_flow) (const struct request_sock *req, struct flowi *fl);
> - int (*tun_dev_create)(void);
> - void (*tun_dev_post_create)(struct sock *sk);
> - int (*tun_dev_attach)(struct sock *sk);
> + int (*tun_dev_alloc_security) (void **security);
> + void (*tun_dev_free_security) (void *security);
> + int (*tun_dev_create) (void);
> + int (*tun_dev_attach) (struct sock *sk, void *security);
> #endif /* CONFIG_SECURITY_NETWORK */
>
> #ifdef CONFIG_SECURITY_NETWORK_XFRM
> @@ -2553,9 +2560,10 @@ void security_inet_conn_established(struct sock *sk,
> int security_secmark_relabel_packet(u32 secid);
> void security_secmark_refcount_inc(void);
> void security_secmark_refcount_dec(void);
> +int security_tun_dev_alloc_security(void **security);
> +void security_tun_dev_free_security(void *security);
> int security_tun_dev_create(void);
> -void security_tun_dev_post_create(struct sock *sk);
> -int security_tun_dev_attach(struct sock *sk);
> +int security_tun_dev_attach(struct sock *sk, void *security);
>
> #else /* CONFIG_SECURITY_NETWORK */
> static inline int security_unix_stream_connect(struct sock *sock,
> @@ -2720,16 +2728,21 @@ static inline void security_secmark_refcount_dec(void)
> {
> }
>
> -static inline int security_tun_dev_create(void)
> +static inline int security_tun_dev_alloc_security(void **security)
> {
> return 0;
> }
>
> -static inline void security_tun_dev_post_create(struct sock *sk)
> +static inline void security_tun_dev_free_security(void *security)
> {
> }
>
> -static inline int security_tun_dev_attach(struct sock *sk)
> +static inline int security_tun_dev_create(void)
> +{
> + return 0;
> +}
> +
> +static inline int security_tun_dev_attach(struct sock *sk, void *security)
> {
> return 0;
> }
> diff --git a/security/capability.c b/security/capability.c
> index b14a30c..fd6e2dc 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -704,16 +704,21 @@ static void cap_req_classify_flow(const struct request_sock *req,
> {
> }
>
> -static int cap_tun_dev_create(void)
> +static int cap_tun_dev_alloc_security(void **security)
> {
> return 0;
> }
>
> -static void cap_tun_dev_post_create(struct sock *sk)
> +static void cap_tun_dev_free_security(void *security)
> +{
> +}
> +
> +static int cap_tun_dev_create(void)
> {
> + return 0;
> }
>
> -static int cap_tun_dev_attach(struct sock *sk)
> +static int cap_tun_dev_attach(struct sock *sk, void *security)
> {
> return 0;
> }
> @@ -1044,8 +1049,9 @@ void __init security_fixup_ops(struct security_operations *ops)
> set_to_cap_if_null(ops, secmark_refcount_inc);
> set_to_cap_if_null(ops, secmark_refcount_dec);
> set_to_cap_if_null(ops, req_classify_flow);
> + set_to_cap_if_null(ops, tun_dev_alloc_security);
> + set_to_cap_if_null(ops, tun_dev_free_security);
> set_to_cap_if_null(ops, tun_dev_create);
> - set_to_cap_if_null(ops, tun_dev_post_create);
> set_to_cap_if_null(ops, tun_dev_attach);
> #endif /* CONFIG_SECURITY_NETWORK */
> #ifdef CONFIG_SECURITY_NETWORK_XFRM
> diff --git a/security/security.c b/security/security.c
> index 8dcd4ae..613ad36 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1244,21 +1244,27 @@ void security_secmark_refcount_dec(void)
> }
> EXPORT_SYMBOL(security_secmark_refcount_dec);
>
> -int security_tun_dev_create(void)
> +int security_tun_dev_alloc_security(void **security)
> {
> - return security_ops->tun_dev_create();
> + return security_ops->tun_dev_alloc_security(security);
> }
> -EXPORT_SYMBOL(security_tun_dev_create);
> +EXPORT_SYMBOL(security_tun_dev_alloc_security);
>
> -void security_tun_dev_post_create(struct sock *sk)
> +void security_tun_dev_free_security(void *security)
> {
> - return security_ops->tun_dev_post_create(sk);
> + security_ops->tun_dev_free_security(security);
> }
> -EXPORT_SYMBOL(security_tun_dev_post_create);
> +EXPORT_SYMBOL(security_tun_dev_free_security);
> +
> +int security_tun_dev_create(void)
> +{
> + return security_ops->tun_dev_create();
> +}
> +EXPORT_SYMBOL(security_tun_dev_create);
>
> -int security_tun_dev_attach(struct sock *sk)
> +int security_tun_dev_attach(struct sock *sk, void *security)
> {
> - return security_ops->tun_dev_attach(sk);
> + return security_ops->tun_dev_attach(sk, security);
> }
> EXPORT_SYMBOL(security_tun_dev_attach);
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 61a5336..67b3423 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4414,40 +4414,49 @@ static int selinux_tun_dev_create(void)
> NULL);
> }
>
> -static void selinux_tun_dev_post_create(struct sock *sk)
> +static int selinux_tun_dev_alloc_security(void **security)
> {
> - struct sk_security_struct *sksec = sk->sk_security;
> + struct tun_security_struct *tunsec;
>
> - /* we don't currently perform any NetLabel based labeling here and it
> - * isn't clear that we would want to do so anyway; while we could apply
> - * labeling without the support of the TUN user the resulting labeled
> - * traffic from the other end of the connection would almost certainly
> - * cause confusion to the TUN user that had no idea network labeling
> - * protocols were being used */
> + tunsec = kzalloc(sizeof(*tunsec), GFP_KERNEL);
> + if (!tunsec)
> + return -ENOMEM;
> + tunsec->sid = current_sid();
>
> - /* see the comments in selinux_tun_dev_create() about why we don't use
> - * the sockcreate SID here */
> + *security = tunsec;
> + return 0;
> +}
>
> - sksec->sid = current_sid();
> - sksec->sclass = SECCLASS_TUN_SOCKET;
> +static void selinux_tun_dev_free_security(void *security)
> +{
> + kfree(security);
> }
>
> -static int selinux_tun_dev_attach(struct sock *sk)
> +static int selinux_tun_dev_attach(struct sock *sk, void *security)
> {
> + struct tun_security_struct *tunsec = security;
> struct sk_security_struct *sksec = sk->sk_security;
> u32 sid = current_sid();
> int err;
>
> + /* we don't currently perform any NetLabel based labeling here and it
> + * isn't clear that we would want to do so anyway; while we could apply
> + * labeling without the support of the TUN user the resulting labeled
> + * traffic from the other end of the connection would almost certainly
> + * cause confusion to the TUN user that had no idea network labeling
> + * protocols were being used */
> +
> err = avc_has_perm(sid, sksec->sid, SECCLASS_TUN_SOCKET,
> TUN_SOCKET__RELABELFROM, NULL);
> if (err)
> return err;
> - err = avc_has_perm(sid, sid, SECCLASS_TUN_SOCKET,
> + err = avc_has_perm(sid, tunsec->sid, SECCLASS_TUN_SOCKET,
> TUN_SOCKET__RELABELTO, NULL);
> if (err)
> return err;
>
> - sksec->sid = sid;
> + sksec->sid = tunsec->sid;
> + sksec->sclass = SECCLASS_TUN_SOCKET;
I'm not sure whether this is correct, looks like we need to differ between TUNSETQUEUE and TUNSETIFF. When userspace call TUNSETIFF for persistent device, looks like we need change the sid of tunsec
like in the past.
Thanks
>
> return 0;
> }
> @@ -5642,8 +5651,9 @@ static struct security_operations selinux_ops = {
> .secmark_refcount_inc = selinux_secmark_refcount_inc,
> .secmark_refcount_dec = selinux_secmark_refcount_dec,
> .req_classify_flow = selinux_req_classify_flow,
> + .tun_dev_alloc_security = selinux_tun_dev_alloc_security,
> + .tun_dev_free_security = selinux_tun_dev_free_security,
> .tun_dev_create = selinux_tun_dev_create,
> - .tun_dev_post_create = selinux_tun_dev_post_create,
> .tun_dev_attach = selinux_tun_dev_attach,
>
> #ifdef CONFIG_SECURITY_NETWORK_XFRM
> diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
> index 26c7eee..aa47bca 100644
> --- a/security/selinux/include/objsec.h
> +++ b/security/selinux/include/objsec.h
> @@ -110,6 +110,10 @@ struct sk_security_struct {
> u16 sclass; /* sock security class */
> };
>
> +struct tun_security_struct {
> + u32 sid; /* SID for the tun device sockets */
> +};
> +
> struct key_security_struct {
> u32 sid; /* SID of key */
> };
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Jason Wang @ 2012-12-03 10:30 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
jwhan, shiyer
In-Reply-To: <20121203101436.GB23009@redhat.com>
On 12/03/2012 06:14 PM, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
>> > - if (!try_fill_recv(&vi->rq, GFP_KERNEL))
>> > - schedule_delayed_work(&vi->rq.refill, 0);
>> > + for (i = 0; i < vi->max_queue_pairs; i++)
>> > + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>> > + schedule_delayed_work(&vi->rq[i].refill, 0);
>> >
>> > mutex_lock(&vi->config_lock);
>> > vi->config_enable = true;
>> > mutex_unlock(&vi->config_lock);
>> >
>> > + BUG_ON(virtnet_set_queues(vi));
>> > +
>> > return 0;
>> > }
>> > #endif
> Also crashing on device nack of command is also not nice.
> In this case it seems we can just switch to
> single-queue mode which should always be safe.
Not sure it's safe. It depends on the reason why this call fails. If we
left a state that the driver only use single queue but the device use
multi queues, we may still lost the network.
^ permalink raw reply
* RE: [PATCH 5/5] smsc95xx: expand check_ macros
From: David Laight @ 2012-12-03 10:41 UTC (permalink / raw)
To: Steve Glendinning, netdev
In-Reply-To: <1354290952-27109-6-git-send-email-steve.glendinning@shawell.net>
> - check_warn_return(ret, "Error reading MII_ACCESS\n");
> + if (ret < 0) {
> + netdev_warn(dev->net, "Error reading MII_ACCESS\n");
> + return ret;
> + }
> +
It might be worth defining something like:
#define check_warn(dev, ret, errmsg) \
(ret >= 0 ? 0 : (netdev_warn(dev->net, errmsg), ret))
so the above code can be:
if (check_warn(dev, ret, "Error reading MII_ACCESS\n"))
return ret;
David
^ permalink raw reply
* [PATCHv5] virtio-spec: virtio network device RFS support
From: Michael S. Tsirkin @ 2012-12-03 10:58 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, kvm, virtualization
Add RFS support to virtio network device.
Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new
configuration field max_virtqueue_pairs to detect supported number of
virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program
packet steering for unidirectional protocols.
---
Changes from v5:
- Address Rusty's comments.
Changes are only in the text, not the ideas.
- Some minor formatting changes.
Changes from v4:
- address Jason's comments
- have configuration specify the number of VQ pairs and not pairs - 1
Changes from v3:
- rename multiqueue -> rfs this is what we support
- Be more explicit about what driver should do.
- Simplify layout making VQs functionality depend on feature.
- Remove unused commands, only leave in programming # of queues
Changes from v2:
Address Jason's comments on v2:
- Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
this is both clearer and easier to support.
It does not look like we need a separate steering command
since host can just watch tx packets as they go.
- Moved RX and TX steering sections near each other.
- Add motivation for other changes in v2
Changes from Jason's rfc:
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
looks nicer to me.
- documented packet steering, added a generalized steering programming
command. Current modes are single queue and host driven multiqueue,
but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
but makes it more efficient to switch between modes as
we can avoid this causing packet reordering.
Rusty, could you please take a look and comment soon?
If this looks OK to everyone, we can proceed with finalizing the
implementation. Would be nice to try and put it in 3.8.
virtio-spec.lyx | 310 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 301 insertions(+), 9 deletions(-)
diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 83f2771..119925c 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -59,6 +59,7 @@
\author -608949062 "Rusty Russell,,,"
\author -385801441 "Cornelia Huck" cornelia.huck@de.ibm.com
\author 1531152142 "Paolo Bonzini,,,"
+\author 1986246365 "Michael S. Tsirkin"
\end_header
\begin_body
@@ -4170,9 +4171,46 @@ ID 1
\end_layout
\begin_layout Description
-Virtqueues 0:receiveq.
- 1:transmitq.
- 2:controlq
+Virtqueues 0:receiveq
+\change_inserted 1986246365 1352742829
+0
+\change_unchanged
+.
+ 1:transmitq
+\change_inserted 1986246365 1352742832
+0
+\change_deleted 1986246365 1352742947
+.
+
+\change_inserted 1986246365 1352742952
+.
+ ....
+ 2N
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531595
+N=0 if VIRTIO_NET_F_RFS is not negotiated, otherwise N is derived from
+\emph on
+max_virtqueue_pairs
+\emph default
+ control
+\emph on
+
+\emph default
+field.
+
+\end_layout
+
+\end_inset
+
+: receivqN.
+ 2N+1: transmitqN.
+ 2N+
+\change_unchanged
+2:controlq
\begin_inset Foot
status open
@@ -4343,6 +4381,16 @@ VIRTIO_NET_F_CTRL_VLAN
\begin_layout Description
VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1352742767
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1352742808
+VIRTIO_NET_F_RFS(22) Device supports Receive Flow Steering.
+\change_unchanged
+
\end_layout
\end_deeper
@@ -4355,11 +4403,45 @@ configuration
\begin_inset space ~
\end_inset
-layout Two configuration fields are currently defined.
+layout
+\change_deleted 1986246365 1352743300
+Two
+\change_inserted 1986246365 1354531413
+Three
+\change_unchanged
+ configuration fields are currently defined.
The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1354531470
+ The following read-only field,
+\emph on
+max_virtqueue_pairs
+\emph default
+ only exists if VIRTIO_NET_F_RFS is set.
+ This field specifies the maximum number of each of transmit and receive
+ virtqueues (receiveq0..receiveq
+\emph on
+N
+\emph default
+ and transmitq0..transmitq
+\emph on
+N
+\emph default
+ respectively;
+\emph on
+N
+\emph default
+=
+\emph on
+max_virtqueue_pairs - 1
+\emph default
+) that can be configured once VIRTIO_NET_F_RFS is negotiated.
+ Legal values for this field are 1 to 8000h.
+
+\change_unchanged
\begin_inset listings
inline false
@@ -4392,6 +4474,17 @@ struct virtio_net_config {
\begin_layout Plain Layout
u16 status;
+\change_inserted 1986246365 1354531427
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531437
+
+ u16 max_virtqueue_pairs;
+\change_unchanged
+
\end_layout
\begin_layout Plain Layout
@@ -4410,7 +4503,24 @@ Device Initialization
\begin_layout Enumerate
The initialization routine should identify the receive and transmission
- virtqueues.
+ virtqueues
+\change_inserted 1986246365 1352744077
+, up to N+1 of each kind
+\change_unchanged
+.
+
+\change_inserted 1986246365 1352743942
+ If VIRTIO_NET_F_RFS feature bit is negotiated,
+\emph on
+N=max_virtqueue_pairs-1
+\emph default
+, otherwise identify
+\emph on
+N=0
+\emph default
+.
+\change_unchanged
+
\end_layout
\begin_layout Enumerate
@@ -4452,10 +4562,33 @@ status
config field.
Otherwise, the link should be assumed active.
+\change_inserted 1986246365 1354529306
+
\end_layout
\begin_layout Enumerate
-The receive virtqueue should be filled with receive buffers.
+
+\change_inserted 1986246365 1354531717
+Only receiveq0, transmitq0 and controlq are used by default.
+ To use more queues driver must negotiate the VIRTIO_NET_F_RFS feature;
+ initialize up to
+\emph on
+max_virtqueue_pairs
+\emph default
+ of each of transmit and receive queues; execute_VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SE
+T command specifying the number of the transmit and receive queues that
+ is going to be used and wait until the device consumes the controlq buffer
+ and acks this command.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Enumerate
+The receive virtqueue
+\change_inserted 1986246365 1352743953
+s
+\change_unchanged
+ should be filled with receive buffers.
This is described in detail below in
\begin_inset Quotes eld
\end_inset
@@ -4550,8 +4683,15 @@ Device Operation
\end_layout
\begin_layout Standard
-Packets are transmitted by placing them in the transmitq, and buffers for
- incoming packets are placed in the receiveq.
+Packets are transmitted by placing them in the transmitq
+\change_inserted 1986246365 1353593685
+0..transmitqN
+\change_unchanged
+, and buffers for incoming packets are placed in the receiveq
+\change_inserted 1986246365 1353593692
+0..receiveqN
+\change_unchanged
+.
In each case, the packet itself is preceeded by a header:
\end_layout
@@ -4861,6 +5001,17 @@ If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at least the
struct virtio_net_hdr
\family default
.
+\change_inserted 1986246365 1353594518
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594638
+If VIRTIO_NET_F_RFS is negotiated, each of receiveq0...receiveqN that will
+ be used should be populated with receive buffers.
+\change_unchanged
+
\end_layout
\begin_layout Subsection*
@@ -5293,8 +5444,149 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
\end_layout
-\begin_layout Enumerate
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1353593879
+Packet Receive Flow Steering
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354528882
+If the driver negotiates the VIRTIO_NET_F_RFS feature bit (depends on VIRTIO_NET
+_F_CTRL_VQ), it can transmit outgoing packets on one of the multiple transmitq0..t
+ransmitqN and ask the device to queue incoming packets into one the multiple
+ receiveq0..receiveqN depending on the packet flow.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594292
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594178
+
+struct virtio_net_ctrl_rfs {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594212
+
+ u16 virtqueue_pairs;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594263
+
+#define VIRTIO_NET_CTRL_RFS 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX 0x8000
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354531492
+RFS acceleration is disabled by default.
+ Driver enables RFS by executing the VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET command,
+ specifying the number of the transmit and receive queues that will be used;
+ thus transmitq0..transmitqn and receiveq0..receiveqn where
+\emph on
+n=virtqueue_pairs-1
+\emph default
+ will be used.
+ All these virtqueues must have been pre-configured in advance.
+ The range of legal values for the
+\emph on
+ virtqueue_pairs
+\emph off
+ field is between 1 and
+\emph on
+max_virtqueue_pairs
+\emph off
+.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353595328
+Programming of the receive flow classificator is implicit.
+ Transmitting a packet of a specific flow on transmitqX will cause incoming
+ packets for this flow to be steered to receiveqX.
+ For uni-directional protocols, or where no packets have been transmitted
+ yet, device will steer a packet to a random queue out of the specified
+ receiveq0..receiveqn.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354528710
+RFS acceleration is disabled by setting
+\emph on
+virtqueue_pairs = 1
+\emph default
+ (this is the default).
+ After the command is consumed by the device, the device will not steer
+ new packets on virtqueues receveq1..receiveqN (i.e.
+ other than receiveq0) nor read from transmitq1..transmitqN (i.e.
+ other than transmitq0); accordingly, driver should not transmit new packets
+ on virtqueues other than transmitq0.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_deleted 1986246365 1353593873
.
+
+\change_unchanged
\end_layout
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox