Netdev List
 help / color / mirror / Atom feed
* [net-next.git 0/3] stmmac: review driver Koptions
From: Giuseppe Cavallaro @ 2014-10-15  8:25 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro

These patches are to rework some Koption used for configuring
the driver. Recently many options have been added and the main
goal behind this work is to guarantee that the driver built fine
on all the branches where it is present. All the Koption for
platform drivers can be safely removed (for example never added
for SPEAr). In that case, it will be possible to validate the
build for all the platforms like STi avoiding failures.
The patches have been tested on STi Boxes w/o issue at runtime.

Giuseppe Cavallaro (3):
  stmmac: remove specific SoC Koption from platform.
  stmmac: remove STMMAC_DEBUG_FS
  stmmac: remove BUS_MODE_DA

 drivers/net/ethernet/stmicro/stmmac/Kconfig        |   68 +------------------
 drivers/net/ethernet/stmicro/stmmac/Makefile       |    8 +--
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |    4 -
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    9 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   14 ++--
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   10 +---
 6 files changed, 16 insertions(+), 97 deletions(-)

-- 
1.7.4.4

^ permalink raw reply

* Re: [PATCH net 2/3] net: sctp: fix panic on duplicate ASCONF chunks
From: Vlad Yasevich @ 2014-10-15  7:51 UTC (permalink / raw)
  To: Daniel Borkmann, Neil Horman; +Cc: davem, linux-sctp, netdev
In-Reply-To: <543B0DD7.7000900@redhat.com>

On 10/13/2014 01:25 AM, Daniel Borkmann wrote:
> On 10/12/2014 03:42 AM, Neil Horman wrote:
>> On Sat, Oct 11, 2014 at 12:02:31AM +0200, Daniel Borkmann wrote:
>>> On 10/10/2014 05:39 PM, Neil Horman wrote:
>>> ...
>>>> Is it worth adding a WARN_ON, to indicate that two ASCONF chunks have been
>>>> received with duplicate serials?
>>>
>>> Don't think so, as this would be triggerable from outside.
>>>
>> WARN_ON_ONCE then, per serial number?
> 
> Sorry, but no. If someone seriously runs that in production and it
> triggers a WARN() from outside, admins will start sending us bug
> reports that apparently something with the kernel code is wrong.
> 
> WARN() should only be used if we have some *internal* unexpected bug,
> but can still fail gracefully. This would neither be an actual code bug
> nor would it be an internally triggered one, plus we add unnecessary
> complexity to the code. Similarly, for those reasons we don't WARN()
> and throw a stack trace when we receive, say, an skb of invalid length
> elsewhere.
> 
> I'd also like to avoid any additional pr_debug().

Why avoid additional pr_debugs()?  The seem cheap, and in this particular
case, they would be rather useful.

> 
> I don't think people enable them in production, and if they really do,
> it's too late anyway as we already have received this chunk. If anything,
> I'd rather like to see debugging code further removed as we have already
> different facilities in the kernel for runtime debugging that are much
> more powerful.

Such as?  pr_debug is part of dynamic debugging which is what we want here.
In prod environments, we don't want any output, but when someone needs to
figure out why something doesn't work, it is very useful to have.

-vlad

^ permalink raw reply

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
From: Vasily Averin @ 2014-10-15  7:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eric.dumazet
In-Reply-To: <20141014.161225.1399177558139744041.davem@davemloft.net>



On 15.10.2014 00:12, David Miller wrote:
> From: Vasily Averin <vvs@parallels.com>
> Date: Tue, 14 Oct 2014 08:57:14 +0400
> 
>> v2: adjust the indentation of the arguments __ip_append_data() call
>>
>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>
>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>> that creates skb and adds it to sk_write_queue.
>>
>> If skb was added successfully following ip_push_pending_frames() call
>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>
>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>>
>> Signed-off-by: Vasily Averin <vvs@parallels.com>
> 
> Why doesn't ip_make_skb() need the same fix?  It seems to do the same
> thing.

It seems for me ip_make_skb() works (almost) correctly,
but seems refcounting can be is incorrect if queue can be not empty 
(Please see details below).

If __ip_append_data() returns errors ip_make_skb() calls 
__ip_flush_pending_frames() that calls ip_cork_release() inside
and frees stolen dst_entry.

If __ip_append_data() returns success -- dst refcounter changes are not required.
In this case skb will be created and added to queue (and it will not be empty)
Later in __ip_make_skb() these skb will get dst reference,
and refcounter will be decremented during kfree_skb(). 

I do not like that there is such unclear dependency between functions,
but seems currently it works correctly.

However I afraid dst refcountng can work incorrectly if sk_write_queue
can be not empty at the moment of ip_append_data() call.
It was not happen in case ip_send_unicast_reply() but probably
can happen in other places.

Let's calculate dst refcounters changes in this case.

First packet:

dst_refcount increment was happen in ip_append_data() caller, taken during rt lookup
- ip_append_data():
--  sk_write_queue is empty, ip_setup_cork() steals dst entry
-- __ip_append_data() adds skb to queue, queue is not flushed, waiting for next packets.
ip_rt_put in ip_append_data() caller does not work, because dst reference was stolen.

dst refcount here +1

then we want to sent 2nd packet:
dst refcount increment was happen in ip_append_data() caller
- ip_append_data():
-- sk_write_queue is NOT empty, dst was not stolen
-- __ip_append_data() adds skb to queue
ip_rt_put in ip_append_data() caller decrements dst refcount, because it as not stolen

dst refcount here +1 

Then we handle new packets, all of them are added to queue

dst refcount is still +1

Then queue is flushed.
Each packet in queue get dst reference from cork,
Each kfree_skb decrements dst refcounter, and it may become negative.

Am I wrong probably?

^ permalink raw reply

* [PATCH] atm: simplify lanai.c by using module_pci_driver
From: Michael Opdenacker @ 2014-10-15  7:45 UTC (permalink / raw)
  To: chas; +Cc: linux-atm-general, netdev, linux-kernel, Michael Opdenacker

This simplifies the lanai.c driver by using
the module_pci_driver() macro, at the expense
of losing only debugging messages.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
---
 drivers/atm/lanai.c | 22 +---------------------
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/atm/lanai.c b/drivers/atm/lanai.c
index fa7d701933ba..93eaf8d94492 100644
--- a/drivers/atm/lanai.c
+++ b/drivers/atm/lanai.c
@@ -2614,27 +2614,7 @@ static struct pci_driver lanai_driver = {
 	.probe    = lanai_init_one,
 };
 
-static int __init lanai_module_init(void)
-{
-	int x;
-
-	x = pci_register_driver(&lanai_driver);
-	if (x != 0)
-		printk(KERN_ERR DEV_LABEL ": no adapter found\n");
-	return x;
-}
-
-static void __exit lanai_module_exit(void)
-{
-	/* We'll only get called when all the interfaces are already
-	 * gone, so there isn't much to do
-	 */
-	DPRINTK("cleanup_module()\n");
-	pci_unregister_driver(&lanai_driver);
-}
-
-module_init(lanai_module_init);
-module_exit(lanai_module_exit);
+module_pci_driver(lanai_driver);
 
 MODULE_AUTHOR("Mitchell Blank Jr <mitch@sfgoth.com>");
 MODULE_DESCRIPTION("Efficient Networks Speedstream 3010 driver");
-- 
1.9.1

^ permalink raw reply related

* RE: [PATCH] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: fugang.duan @ 2014-10-15  7:42 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Frank.Li@freescale.com, David Miller, netdev@vger.kernel.org,
	bhutchings@solarflare.com
In-Reply-To: <20141015072540.GA4172@localhost.localdomain>

From: Richard Cochran <richardcochran@gmail.com> Sent: Wednesday, October 15, 2014 3:26 PM
>To: Duan Fugang-B38611
>Cc: Li Frank-B20596; David Miller; netdev@vger.kernel.org;
>bhutchings@solarflare.com
>Subject: Re: [PATCH] net: fec: ptp: fix convergence issue to support
>LinuxPTP stack
>
>On Wed, Oct 15, 2014 at 01:30:41AM +0000, fugang.duan@freescale.com wrote:
>>
>> Pls read the commit log in entire ?
>
>I did read it.
>
>It is incomprehensible.
>
>Can you please fix it?
>
>Thanks,
>Richard

Ok, I will enhance the commit/comment log and then send out V2.

Thanks,
Andy

^ permalink raw reply

* Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt
From: Jason Wang @ 2014-10-15  7:28 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Miller
  Cc: kvm, netdev, linux-kernel, virtualization, linux-api
In-Reply-To: <20141014230627.GA23715@redhat.com>

On 10/15/2014 07:06 AM, Michael S. Tsirkin wrote:
> On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
>> > From: Jason Wang <jasowang@redhat.com>
>> > Date: Sat, 11 Oct 2014 15:16:43 +0800
>> > 
>>> > > We free old transmitted packets in ndo_start_xmit() currently, so any
>>> > > packet must be orphaned also there. This was used to reduce the overhead of
>>> > > tx interrupt to achieve better performance. But this may not work for some
>>> > > protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
>>> > > implement various optimization for small packets stream such as TCP small
>>> > > queue and auto corking. But orphaning packets early in ndo_start_xmit()
>>> > > disable such things more or less since sk_wmem_alloc was not accurate. This
>>> > > lead extra low throughput for TCP stream of small writes.
>>> > > 
>>> > > This series tries to solve this issue by enable tx interrupts for all TCP
>>> > > packets other than the ones with push bit or pure ACK. This is done through
>>> > > the support of urgent descriptor which can force an interrupt for a
>>> > > specified packet. If tx interrupt was enabled for a packet, there's no need
>>> > > to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
>>> > > by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
>>> > > can batch more for small write. More larger skb was produced by TCP in this
>>> > > case to improve both throughput and cpu utilization.
>>> > > 
>>> > > Test shows great improvements on small write tcp streams. For most of the
>>> > > other cases, the throughput and cpu utilization are the same in the
>>> > > past. Only few cases, more cpu utilization was noticed which needs more
>>> > > investigation.
>>> > > 
>>> > > Review and comments are welcomed.
>> > 
>> > I think proper accounting and queueing (at all levels, not just TCP
>> > sockets) is more important than trying to skim a bunch of cycles by
>> > avoiding TX interrupts.
>> > 
>> > Having an event to free the SKB is absolutely essential for the stack
>> > to operate correctly.
>> > 
>> > And with virtio-net you don't even have the excuse of "the HW
>> > unfortunately doesn't have an appropriate TX event."
>> > 
>> > So please don't play games, and instead use TX interrupts all the
>> > time.  You can mitigate them in various ways, but don't turn them on
>> > selectively based upon traffic type, that's terrible.
>> > 
>> > You can even use ->xmit_more to defer the TX interrupt indication to
>> > the final TX packet in the chain.
> I guess we can just defer the kick, interrupt will naturally be
> deferred as well.
> This should solve the problem for old hosts as well.

Interrupt were delayed but not reduced, to support this we need publish
avail idx as used event. This should reduce the tx interrupt in the case
of bulk dequeuing.

I will draft a new rfc series contain this.
>
> We'll also need to implement bql for this.
> Something like the below?
> Completely untested - posting here to see if I figured the
> API out correctly. Has to be applied on top of the previous patch.

Looks so. I believe better to have but not a must.

^ permalink raw reply

* Re: [PATCH] nf_conntrack_proto_tcp: allow server to become a client in TW handling
From: Jozsef Kadlecsik @ 2014-10-15  7:27 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: Pablo Neira Ayuso, netfilter-devel, netdev
In-Reply-To: <a9618c5db692413c2970201b5206bcd357438399.1413216136.git.mleitner@redhat.com>

On Mon, 13 Oct 2014, Marcelo Ricardo Leitner wrote:

> When a port that was used to listen for inbound connections gets closed
> and reused for outgoing connections (like rsh ends up doing for stderr
> flow), current we may reject the SYN/ACK packet for the new connection
> because tcp_conntracks states forbirds a port to become a client while
> there is still a TIME_WAIT entry in there for it.
> 
> As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
> for it is 120s, there is a ~60s window that the application can end up
> opening a port that conntrack will end up blocking.
> 
> This patch fixes this by simply allowing such state transition: if we
> see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
> that the rest of the code already handles this situation, more
> specificly in tcp_packet(), first switch clause.

In those code branch if there was a valid FIN in either direction, we 
destroy the old connection and a new will be created. That way the rules 
about NEW connections will be applied, so the policies are not bypassed. 
Otherwise we just ignore the SYN packet, so if it's invalid, we'll catch 
the RST from the other side and destroy the conntrack entry. The event 
flow looks OK to me.

> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

Best regards,
Jozsef

> ---
>  net/netfilter/nf_conntrack_proto_tcp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 44d1ea32570a07338dc39f34624bd823b6f76916..d87b6423ffb21e0f8f9b6ef25ef51c1cb5f54ad6 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -213,7 +213,7 @@ static const u8 tcp_conntracks[2][6][TCP_CONNTRACK_MAX] = {
>  	{
>  /* REPLY */
>  /* 	     sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sS2	*/
> -/*syn*/	   { sIV, sS2, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sS2 },
> +/*syn*/	   { sIV, sS2, sIV, sIV, sIV, sIV, sIV, sSS, sIV, sS2 },
>  /*
>   *	sNO -> sIV	Never reached.
>   *	sSS -> sS2	Simultaneous open
> @@ -223,7 +223,7 @@ static const u8 tcp_conntracks[2][6][TCP_CONNTRACK_MAX] = {
>   *	sFW -> sIV
>   *	sCW -> sIV
>   *	sLA -> sIV
> - *	sTW -> sIV	Reopened connection, but server may not do it.
> + *	sTW -> sSS	Reopened connection, but server may have switched role
>   *	sCL -> sIV
>   */
>  /* 	     sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sS2	*/
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* [PATCH net] cxgb4i : Fix -Wmaybe-uninitialized warning.
From: Anish Bhatt @ 2014-10-15  7:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, kxie, Anish Bhatt

Identified by kbuild test robot. csk family is always set to be AF_INET or 
AF_INET6, so skb will always be initialized to some value but there is no harm 
in silencing the warning anyways.

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Fixes : f42bb57c61fd ('cxgb4i : Fix -Wunused-function warning')
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 8c3003b..3e0a0d3 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -758,7 +758,7 @@ static int act_open_rpl_status_to_errno(int status)
 
 static void csk_act_open_retry_timer(unsigned long data)
 {
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 	struct cxgbi_sock *csk = (struct cxgbi_sock *)data;
 	struct cxgb4_lld_info *lldi = cxgbi_cdev_priv(csk->cdev);
 	void (*send_act_open_func)(struct cxgbi_sock *, struct sk_buff *,
-- 
2.1.2

^ permalink raw reply related

* Re: [PATCH] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: Richard Cochran @ 2014-10-15  7:25 UTC (permalink / raw)
  To: fugang.duan@freescale.com
  Cc: Frank.Li@freescale.com, David Miller, netdev@vger.kernel.org,
	bhutchings@solarflare.com
In-Reply-To: <ca979a5eb7764367a6b8b944948cbd1a@BLUPR03MB373.namprd03.prod.outlook.com>

On Wed, Oct 15, 2014 at 01:30:41AM +0000, fugang.duan@freescale.com wrote:
> 
> Pls read the commit log in entire ?

I did read it. 

It is incomprehensible.

Can you please fix it?

Thanks,
Richard

^ permalink raw reply

* [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev
  Cc: davem, eric.dumazet, Jason Wang
In-Reply-To: <1413357930-45302-1-git-send-email-jasowang@redhat.com>

With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
enable tx interrupt only for the final skb in the chain if
needed. This will help to mitigate tx interrupts.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2afc2e2..5943f3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
-	} else if (virtqueue_enable_cb(sq->vq)) {
-		free_old_xmit_skbs(sq, qsize);
 	}
 
-	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
+	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) {
 		virtqueue_kick(sq->vq);
+		if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS &&
+		    unlikely(virtqueue_enable_cb_avail(sq->vq))) {
+			free_old_xmit_skbs(sq, qsize);
+			virtqueue_disable_cb(sq->vq);
+		}
+	}
 
 	return NETDEV_TX_OK;
 }
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev
  Cc: davem, eric.dumazet, Jason Wang
In-Reply-To: <1413357930-45302-1-git-send-email-jasowang@redhat.com>

Orphan skb in ndo_start_xmit() breaks socket accounting and packet
queuing. This in fact breaks lots of things such as pktgen and several
TCP optimizations. And also make BQL can't be implemented for
virtio-net.

This patch tries to solve this issue by enabling tx interrupt. To
avoid introducing extra spinlocks, a tx napi was scheduled to free
those packets.

More tx interrupt mitigation method could be used on top.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
 1 files changed, 85 insertions(+), 40 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ccf98f9..2afc2e2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	u64 tx_bytes = 0, tx_packets = 0;
+
+	while (tx_packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		tx_bytes += skb->len;
+		tx_packets++;
+
+		dev_kfree_skb_any(skb);
+	}
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += tx_bytes;
+	stats->tx_packets =+ tx_packets;
+	u64_stats_update_end(&stats->tx_syncp);
+
+	return tx_packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi_schedule_prep(&sq->napi)) {
+		__napi_schedule(&sq->napi);
+	}
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,7 +801,39 @@ again:
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq =
+		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	unsigned int r, sent = 0;
+
+again:
+	__netif_tx_lock(txq, smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
+	sent += free_old_xmit_skbs(sq, budget - sent);
+
+	if (sent < budget) {
+		r = virtqueue_enable_cb_prepare(sq->vq);
+		napi_complete(napi);
+		__netif_tx_unlock(txq);
+		if (unlikely(virtqueue_poll(sq->vq, r)) &&
+		    napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+			goto again;
+		}
+	} else {
+		__netif_tx_unlock(txq);
+	}
+
+	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+	return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
+
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
 {
@@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static int free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-	u64 tx_bytes = 0, tx_packets = 0;
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		tx_bytes += skb->len;
-		tx_packets++;
-
-		dev_kfree_skb_any(skb);
-	}
-
-	u64_stats_update_begin(&stats->tx_syncp);
-	stats->tx_bytes += tx_bytes;
-	stats->tx_packets =+ tx_packets;
-	u64_stats_update_end(&stats->tx_syncp);
-
-	return tx_packets;
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct skb_vnet_hdr *hdr;
@@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
+
 	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
 }
 
@@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
-	int err;
+	int err, qsize = virtqueue_get_vring_size(sq->vq);
 
+	virtqueue_disable_cb(sq->vq);
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(sq, qsize);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
-	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
-
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
 		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, qsize);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
+	} else if (virtqueue_enable_cb(sq->vq)) {
+		free_old_xmit_skbs(sq, qsize);
 	}
 
 	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
@@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
+	}
 
 	kfree(vi->rq);
 	kfree(vi->sq);
@@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 			napi_hash_del(&vi->rq[i].napi);
 			netif_napi_del(&vi->rq[i].napi);
+			netif_napi_del(&vi->sq[i].napi);
 		}
 	}
 
@@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs()
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev
  Cc: davem, eric.dumazet, Jason Wang
In-Reply-To: <1413357930-45302-1-git-send-email-jasowang@redhat.com>

This patch returns the number of packets sent in free_old_xmit_skbs(), this is
necessary for calling this function in napi.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a4d56b8..ccf98f9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -827,7 +827,7 @@ static int virtnet_open(struct net_device *dev)
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
+static int free_old_xmit_skbs(struct send_queue *sq)
 {
 	struct sk_buff *skb;
 	unsigned int len;
@@ -848,6 +848,8 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	stats->tx_bytes += tx_bytes;
 	stats->tx_packets =+ tx_packets;
 	u64_stats_update_end(&stats->tx_syncp);
+
+	return tx_packets;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet
In-Reply-To: <1413357930-45302-1-git-send-email-jasowang@redhat.com>

Accumulate the sent packets and sent bytes in local variables and perform a
single u64_stats_update_begin/end() after.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..a4d56b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	unsigned int len;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	u64 tx_bytes = 0, tx_packets = 0;
 
 	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
+		tx_bytes += skb->len;
+		tx_packets++;
 
 		dev_kfree_skb_any(skb);
 	}
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += tx_bytes;
+	stats->tx_packets =+ tx_packets;
+	u64_stats_update_end(&stats->tx_syncp);
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet
In-Reply-To: <1413357930-45302-1-git-send-email-jasowang@redhat.com>

This patch introduces virtio_enable_cb_avail() to publish avail idx
and used event. This could be used by batched buffer submitting to
reduce the number of tx interrupts.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
 include/linux/virtio.h       |    2 ++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1b3929f..d67fbf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
 	/* Make sure used event never go backwards */
-	if (!vring_need_event(vring_used_event(&vq->vring),
-			      vq->vring.avail->idx, last_used_idx))
+	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
+	    !vring_need_event(vring_used_event(&vq->vring),
+			      vq->vring.avail->idx, last_used_idx)) {
 		vring_used_event(&vq->vring) = last_used_idx;
+	}
 	END_USE(vq);
 	return last_used_idx;
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	bool ret;
+
+	START_USE(vq);
+	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+	vring_used_event(&vq->vring) = vq->vring.avail->idx;
+	ret = vring_need_event(vq->vring.avail->idx,
+			       vq->last_used_idx, vq->vring.used->idx);
+	END_USE(vq);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
+
 /**
  * virtqueue_poll - query pending used buffers
  * @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b46671e..bfaf058 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet
In-Reply-To: <1413357930-45302-1-git-send-email-jasowang@redhat.com>

This patch checks the new event idx to make sure used event idx never
goes back. This is used to synchronize the calls between
virtqueue_enable_cb_delayed() and virtqueue_enable_cb().

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3b1f89b..1b3929f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 	u16 last_used_idx;
 
 	START_USE(vq);
-
+	last_used_idx = vq->last_used_idx;
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
 	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
 	 * either clear the flags bit or point the event index at the next
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
-	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
+	/* Make sure used event never go backwards */
+	if (!vring_need_event(vring_used_event(&vq->vring),
+			      vq->vring.avail->idx, last_used_idx))
+		vring_used_event(&vq->vring) = last_used_idx;
 	END_USE(vq);
 	return last_used_idx;
 }
-- 
1.7.1

^ permalink raw reply related

* [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet

According to David, proper accounting and queueing (at all levels, not
just TCP sockets) is more important than trying to skim a bunch of
cycles by avoiding TX interrupts. Having an event to free the SKB is
absolutely essential for the stack to operate correctly.

This series tries to enable tx interrupt for virtio-net. The idea is
simple: enable tx interrupt and schedule a tx napi to free old xmit
skbs.

Several notes:
- Tx interrupt storm avoidance when queue is about to be full is
  kept. Since we may enable callbacks in both ndo_start_xmit() and tx
  napi, patch 1 adds a check to make sure used event never go
  back. This will let the napi not enable the callbacks wrongly after
  delayed callbacks was used.
- For bulk dequeuing, there's no need to enable tx interrupt for each
  packet. The last patch only enable tx interrupt for the final skb in
  the chain through xmit_more and a new helper to publish current avail
  idx as used event.

This series fixes several issues of original rfc pointed out by Michael.

Smoking test is done, and will do complete netperf test. Send the
seires for early review.

Thanks

Jason Wang (6):
  virtio: make sure used event never go backwards
  virtio: introduce virtio_enable_cb_avail()
  virtio-net: small optimization on free_old_xmit_skbs()
  virtio-net: return the number of packets sent in free_old_xmit_skbs()
  virtio-net: enable tx interrupt
  virtio-net: enable tx interrupt only for the final skb in the chain

 drivers/net/virtio_net.c     |  125 ++++++++++++++++++++++++++++++------------
 drivers/virtio/virtio_ring.c |   25 ++++++++-
 include/linux/virtio.h       |    2 +
 3 files changed, 115 insertions(+), 37 deletions(-)

^ permalink raw reply

* Re: [PATCH v4 03/25] virtio-pci: move freeze/restore to virtio core
From: Paul Bolle @ 2014-10-15  7:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Valentin Rothberg, linux-kernel, Rusty Russell, virtualization,
	linux-scsi, linux-s390, v9fs-developer, netdev, kvm, Amit Shah,
	Cornelia Huck, Christian Borntraeger, David S. Miller,
	Paolo Bonzini, Heinz Graalfs
In-Reply-To: <1413114332-626-4-git-send-email-mst-v4@redhat.com>

On Mon, 2014-10-13 at 10:48 +0300, Michael S. Tsirkin wrote:
> This is in preparation to extending config changed event handling
> in core.
> Wrapping these in an API also seems to make for a cleaner code.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

This patch landed in today's linux-next (next-20141015).

>  include/linux/virtio.h      |  6 +++++
>  drivers/virtio/virtio.c     | 54 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/virtio/virtio_pci.c | 54 ++-------------------------------------------
>  3 files changed, 62 insertions(+), 52 deletions(-)
> 
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 3c19bd3..8df7ba8 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -78,6 +78,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
>  /**
>   * virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> + * @failed: saved value for CONFIG_S_FAILED bit (for restore)

s/CONFIG_S_FAILED/VIRTIO_CONFIG_S_FAILED/?

>   * @dev: underlying device.
>   * @id: the device type identification (used to match it with a driver).
>   * @config: the configuration ops for this device.


Paul Bolle

^ permalink raw reply

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
From: Vasily Averin @ 2014-10-15  6:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1413348385.12304.0.camel@edumazet-glaptop2.roam.corp.google.com>

On 15.10.2014 08:46, Eric Dumazet wrote:
> On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
>> v2: adjust the indentation of the arguments __ip_append_data() call
>>
>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>
>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>> that creates skb and adds it to sk_write_queue.
>>
>> If skb was added successfully following ip_push_pending_frames() call
>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>
>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
> 
> I thought this was done by ip_flush_pending_frames() ?

Take look at ip_send_unicast_reply():

ip_flush_pending_frames() is not called if skb was not added to sk_write_queue.
And ip_rt_put() does not work, because dst entry was stolen in ip_setup_cork().

Probably it can happen in raw_sendmsg() and udp_sendmsg() too.

^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-15  5:47 UTC (permalink / raw)
  To: Oliver Hartkopp, davem@davemloft.net, jkosina@suse.cz
  Cc: netdev@vger.kernel.org, LeoLi@freescale.com,
	linux-doc@vger.kernel.org, Jiafei.Pan@freescale.com
In-Reply-To: <543DFF2C.40306@hartkopp.net>



> -----Original Message-----
> From: Oliver Hartkopp [mailto:socketcan@hartkopp.net]
> Sent: Wednesday, October 15, 2014 12:59 PM
> To: Pan Jiafei-B37022; davem@davemloft.net; jkosina@suse.cz
> Cc: netdev@vger.kernel.org; Li Yang-Leo-R58472; linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> On 15.10.2014 05:26, Pan Jiafei wrote:
> > In some platform, there are some hardware block provided
> > to manage buffers to improve performance.
> 
> (..)
[Pan Jiafei] I want to build a general patch to build skb
by using hardware buffer manager block.
> 
> > diff --git a/net/Kconfig b/net/Kconfig
> > index d6b138e..346e021 100644
> > --- a/net/Kconfig
> > +++ b/net/Kconfig
> > @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT
> >   	  with many clients some protection against DoS by a single (spoofed)
> >   	  flow that greatly exceeds average workload.
> >
> > +config USE_HW_SKB
> > +	bool "NIC use hardware managed buffer to build skb"
> > +	depends on INET
> 
> The feature seems to be valid for network devices in general.
> Why did you make it depending on INET ??
> 
> Regards,
> Oliver
> 
[Pan Jiafei] Yes, INET dependency should be removed, thanks.
> > +	---help---
> > +	  If select this, the third party drivers will use hardware managed
> > +	  buffers to allocate SKB without any modification for the driver.
> > +
> > +	  Documentation on how to use this featue can be found at
> > +	  <file:Documentation/networking/hw_skb.txt>.
> > +
> >   menu "Network testing"
> >
> >   config NET_PKTGEN


^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-15  5:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem@davemloft.net, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org,
	Jiafei.Pan@freescale.com
In-Reply-To: <1413346503.17109.23.camel@edumazet-glaptop2.roam.corp.google.com>



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Wednesday, October 15, 2014 12:15 PM
> To: Pan Jiafei-B37022
> Cc: davem@davemloft.net; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-
> R58472; linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> On Wed, 2014-10-15 at 11:26 +0800, Pan Jiafei wrote:
> > In some platform, there are some hardware block provided
> > to manage buffers to improve performance. So in some case,
> > it is expected that the packets received by some generic
> > NIC should be put into such hardware managed buffers
> > directly, so that such buffer can be released by hardware
> > or by driver.
> 
> You repeat 'some' four times.
> 
> >
> > This patch provide such general APIs for generic NIC to
> > use hardware block managed buffers without any modification
> > for generic NIC drivers.
> 
> ...
> 
> > In this patch, the following fields are added to "net_device":
> >     void *hw_skb_priv;
> >     struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
> >     void (*free_hw_skb)(struct sk_buff *skb);
> > so in order to let generic NIC driver to use hardware managed
> > buffers, the function "alloc_hw_skb" and "free_hw_skb"
> > provide implementation for allocate and free hardware managed
> > buffers. "hw_skb_priv" is provided to pass some private data for
> > these two functions.
> >
> > When the socket buffer is allocated by these APIs, "hw_skb_state"
> > is provided in struct "sk_buff". this argument can indicate
> > that the buffer is hardware managed buffer, this buffer
> > should freed by software or by hardware.
> >
> > Documentation on how to use this featue can be found at
> > <file:Documentation/networking/hw_skb.txt>.
> >
> > Signed-off-by: Pan Jiafei <Jiafei.Pan@freescale.com>
> 
> 
> I am giving a strong NACK, of course.
> 
> We are not going to grow sk_buff and add yet another conditional in fast
> path for a very obscure feature like that.
> 
> Memory management is not going to be done by drivers.
> 
> The way it should work is that if your hardware has specific needs, rx
> and tx paths (of the driver) need to make the needed adaptation.
> Not the other way.
> 
> We already have complex skb layouts, we do not need a new one.
> 
> Take a look at how drivers can 'lock' pages already, and build skb sith
> page frags. It is already there.
> 
For my case, there are some shortcoming to use page frags. Firstly, I have to
modify each Ethernet drivers to support it especially I don’t which vendor's
driver the customer will use. Secondly, it is not enough only 
build skb by 'lock' pages, the buffer address comes from hardware block and
should be aligned to hardware block.

^ permalink raw reply

* Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors
From: Rusty Russell @ 2014-10-15  5:40 UTC (permalink / raw)
  To: Jason Wang, mst, virtualization, netdev, linux-kernel; +Cc: linux-api, kvm
In-Reply-To: <1413011806-3813-2-git-send-email-jasowang@redhat.com>

Jason Wang <jasowang@redhat.com> writes:
> Below should be useful for some experiments Jason is doing.
> I thought I'd send it out for early review/feedback.
>
> event idx feature allows us to defer interrupts until
> a specific # of descriptors were used.
> Sometimes it might be useful to get an interrupt after
> a specific descriptor, regardless.
> This adds a descriptor flag for this, and an API
> to create an urgent output descriptor.
> This is still an RFC:
> we'll need a feature bit for drivers to detect this,
> but we've run out of feature bits for virtio 0.X.
> For experimentation purposes, drivers can assume
> this is set, or add a driver-specific feature bit.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

The new VRING_DESC_F_URGENT bit is theoretically nicer, but for
networking (which tends to take packets in order) couldn't we just set
the event counter to give us a tx interrupt at the packet we want?

Cheers,
Rusty.

^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-15  5:34 UTC (permalink / raw)
  To: David Miller
  Cc: jkosina@suse.cz, netdev@vger.kernel.org, LeoLi@freescale.com,
	linux-doc@vger.kernel.org, Jiafei.Pan@freescale.com
In-Reply-To: <20141015.002514.384962932982389732.davem@davemloft.net>



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, October 15, 2014 12:25 PM
> To: Pan Jiafei-B37022
> Cc: jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; linux-
> doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> From: Pan Jiafei <Jiafei.Pan@freescale.com>
> Date: Wed, 15 Oct 2014 11:26:11 +0800
> 
> > In some platform, there are some hardware block provided
> > to manage buffers to improve performance. So in some case,
> > it is expected that the packets received by some generic
> > NIC should be put into such hardware managed buffers
> > directly, so that such buffer can be released by hardware
> > or by driver.
> >
> > This patch provide such general APIs for generic NIC to
> > use hardware block managed buffers without any modification
> > for generic NIC drivers.
> 
> Why are existing interfaces insufficent for your needs?
> 
> Several ethernet drivers already build SKBs from block
> buffer pools.
> 
Yes, for this matter, in order to do this we should modify the Ethernet
drivers. For example, driver A want to driver B, C, D.. to support driver
A's Hardware block access functions, so we have to modify driver B, C, D...
It will be so complex for this matter.

But by using my patch, I just modify a Ethernet device (I don't care
Which driver it is used) flag in driver A in order to implement this
Ethernet device using hardware block access functions provided by
Driver A.

> They allocate pools of pages which the hardware divides into various
> powers of 2, then the RX descriptor says what pieces of which pools
> were used to hold the data for a packet, and then the SKB is
> constructed with page frags pointing to those locations.
> 
> It's very cheap, inexpensive, and existing APIs are considered to
> cover all use cases.

^ permalink raw reply

* [PATCH (net.git) 1/1] stmmac: fix sti compatibililies
From: Giuseppe Cavallaro @ 2014-10-15  5:30 UTC (permalink / raw)
  To: netdev; +Cc: davem, sfr, linux-kernel, linux-next, Giuseppe Cavallaro

this patch is to fix the stmmac data compatibilities for
all the SoCs inside the platform file.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    3 ++-
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |    8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 4452889..c3c4065 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -144,7 +144,8 @@ extern const struct stmmac_of_data meson6_dwmac_data;
 extern const struct stmmac_of_data sun7i_gmac_data;
 #endif
 #ifdef CONFIG_DWMAC_STI
-extern const struct stmmac_of_data sti_gmac_data;
+extern const struct stmmac_of_data stih4xx_dwmac_data;
+extern const struct stmmac_of_data stid127_dwmac_data;
 #endif
 #ifdef CONFIG_DWMAC_SOCFPGA
 extern const struct stmmac_of_data socfpga_gmac_data;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 0d6b9ad..db56fa7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -37,10 +37,10 @@ static const struct of_device_id stmmac_dt_ids[] = {
 	{ .compatible = "allwinner,sun7i-a20-gmac", .data = &sun7i_gmac_data},
 #endif
 #ifdef CONFIG_DWMAC_STI
-	{ .compatible = "st,stih415-dwmac", .data = &sti_gmac_data},
-	{ .compatible = "st,stih416-dwmac", .data = &sti_gmac_data},
-	{ .compatible = "st,stid127-dwmac", .data = &sti_gmac_data},
-	{ .compatible = "st,stih407-dwmac", .data = &sti_gmac_data},
+	{ .compatible = "st,stih415-dwmac", .data = &stih4xx_dwmac_data},
+	{ .compatible = "st,stih416-dwmac", .data = &stih4xx_dwmac_data},
+	{ .compatible = "st,stid127-dwmac", .data = &stid127_dwmac_data},
+	{ .compatible = "st,stih407-dwmac", .data = &stih4xx_dwmac_data},
 #endif
 #ifdef CONFIG_DWMAC_SOCFPGA
 	{ .compatible = "altr,socfpga-stmmac", .data = &socfpga_gmac_data },
-- 
1.7.4.4

^ permalink raw reply related

* Re: linux-next: build failure after merge of the net tree
From: Giuseppe CAVALLARO @ 2014-10-15  5:24 UTC (permalink / raw)
  To: David Miller, sfr; +Cc: netdev, linux-next, linux-kernel
In-Reply-To: <20141014.220254.144210672646232483.davem@davemloft.net>

On 10/15/2014 4:02 AM, David Miller wrote:
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Wed, 15 Oct 2014 10:44:11 +1100
>
>> Hi all,
>>
>> After merging the net tree, today's linux-next build (arm
>> multi_v7_defconfig) failed like this:
>>
>> drivers/built-in.o: In function `.LANCHOR0':
>> :(.rodata+0x6b764): undefined reference to `sti_gmac_data'
>> :(.rodata+0x6b828): undefined reference to `sti_gmac_data'
>> :(.rodata+0x6b8ec): undefined reference to `sti_gmac_data'
>> :(.rodata+0x6b9b0): undefined reference to `sti_gmac_data'
>>
>> Caused by commit 53b26b9bc9a5 ("stmmac: dwmac-sti: review the
>> glue-logic for STi4xx and STiD127 SoCs") which renamed sti_gmac_data to
>> stih4xx_dwmac_data (or something) without updating all the references
>> to it (including the one added in the previous commit ...).
>>
>> I reverted that commit for today.
>
> Sigh, Giuseppe if I don't see a proper fix by tomorrow I'm
> reverting all of the stmmac changes I applied today.

I am looking at this now and let you know soon

peppe

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Oliver Hartkopp @ 2014-10-15  4:59 UTC (permalink / raw)
  To: Pan Jiafei, davem, jkosina; +Cc: netdev, LeoLi, linux-doc
In-Reply-To: <1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com>

On 15.10.2014 05:26, Pan Jiafei wrote:
> In some platform, there are some hardware block provided
> to manage buffers to improve performance.

(..)

> diff --git a/net/Kconfig b/net/Kconfig
> index d6b138e..346e021 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT
>   	  with many clients some protection against DoS by a single (spoofed)
>   	  flow that greatly exceeds average workload.
>
> +config USE_HW_SKB
> +	bool "NIC use hardware managed buffer to build skb"
> +	depends on INET

The feature seems to be valid for network devices in general.
Why did you make it depending on INET ??

Regards,
Oliver

> +	---help---
> +	  If select this, the third party drivers will use hardware managed
> +	  buffers to allocate SKB without any modification for the driver.
> +
> +	  Documentation on how to use this featue can be found at
> +	  <file:Documentation/networking/hw_skb.txt>.
> +
>   menu "Network testing"
>
>   config NET_PKTGEN


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox