virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Michael Dalton <mwdalton@google.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
Date: Tue, 19 Nov 2013 23:53:12 +0200	[thread overview]
Message-ID: <20131119215312.GE15004@redhat.com> (raw)
In-Reply-To: <1384896996.8604.120.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > > > We need to drop the refcnt of page when we fail to allocate an skb for frag
> > > > list, otherwise it will be leaked. The bug was introduced by commit
> > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> > > > buffers to page frag allocators").
> > > > 
> > > > Cc: Michael Dalton <mwdalton@google.com>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > The patch was needed for 3.12 stable.
> > > 
> > > Good catch, but if we return from receive_mergeable() in the 'middle'
> > > of the frags we would need for the current skb, who will
> > > call the virtqueue_get_buf() to flush the remaining frags ?
> > > 
> > > Don't we also need to call virtqueue_get_buf() like 
> > > 
> > > while (--num_buf) {
> > >     buf = virtqueue_get_buf(rq->vq, &len);
> > >     if (!buf)
> > >         break;
> > >     put_page(virt_to_head_page(buf));
> > > }
> > > 
> > > ?
> > > 
> > > 
> > 
> > 
> > Let me explain what worries me in your suggestion:
> > 
> >                         struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
> >                         if (unlikely(!nskb)) {
> >                                 head_skb->dev->stats.rx_dropped++;
> >                                 return -ENOMEM;
> >                         }
> > 
> > is this the failure case we are talking about?
> 
> I thought Jason patch was about this, no ?
> 
> > 
> > I think this is a symprom of a larger problem
> > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> > namely that we now need to allocate memory in the
> > middle of processing a packet.
> > 
> > 
> > I think discarding a completely valid and well-formed
> > packet from the receive queue because we are unable
> > to allocate new memory with GFP_ATOMIC
> > for future packets is not a good idea.
> 
> How is it different with NIC processing in RX path ?


Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.

> > 
> > It certainly violates the principle of least surprize:
> > when one sees host pass packet to guest, one expects
> > the packet to get into the networking stack, not get
> > dropped by the driver internally.
> > Guest stack can do with the packet what it sees fit.
> > 
> > We actually wake up a thread if we can't fill up the queue,
> > that will fill it up in GFP_KERNEL context.
> > 
> > So I think we should find a way to pre-allocate if necessary and avoid
> > error paths where allocating new memory is a required to avoid drops.
> > 
> 
> Really, under ATOMIC context, there is no way you can avoid dropping
> packets if you cannot allocate memory. If you cannot allocate sk_buff
> (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
> payload of next packets anyway. 

that's why we do:

                if (!try_fill_recv(rq, GFP_ATOMIC))
                        schedule_delayed_work(&vi->refill, 0);


the queues are large enough for a single failure not to be
an immediate problem.


> Same problem on a real NIC.
> 
> Under memory pressure we _do_ packet drops.
> Nobody really complained.
>
> Sure, you can add yet another cache of pre-allocated skbs and pay the
> price of managing yet another cache layer, but still need to trop
> packets under stress.

We don't need a cache even. Just enough to avoid dropping packets
if allocation failed in the middle so we don't dequeue a buffer and then
drop it.

Once we use this reserved skb, we stop processing the queue until
refill gives it back.

> Pre-allocating skb on real NIC has a performance cost, because we clear
> sk_buff way ahead of time. By the time skb is finally received, cpu has
> to bring back into its cache memory cache lines.
> 

Alternatively we can pre-allocate the memory but avoid clearing it maybe?

-- 
MST

  reply	other threads:[~2013-11-19 21:53 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-19  8:05 [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb Jason Wang
2013-11-19 14:03 ` Eric Dumazet
2013-11-19 18:44   ` Michael S. Tsirkin
2013-11-19 20:49   ` Michael S. Tsirkin
2013-11-19 21:36     ` Eric Dumazet
2013-11-19 21:53       ` Michael S. Tsirkin [this message]
2013-11-19 22:00         ` Eric Dumazet
2013-11-20  1:34           ` Michael Dalton
2013-11-20  3:17             ` Jason Wang
2013-11-20  9:00             ` Michael S. Tsirkin
2013-11-20  8:58           ` Michael S. Tsirkin
2013-11-20 15:16             ` Eric Dumazet
2013-11-20 16:06               ` Michael S. Tsirkin
2013-11-20 16:14                 ` Eric Dumazet
2013-11-20 17:03                   ` Michael S. Tsirkin
2013-11-19 21:38     ` Michael Dalton
2013-11-20  9:06       ` Michael S. Tsirkin
2013-11-20  3:05     ` Jason Wang
2013-11-20  3:00   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131119215312.GE15004@redhat.com \
    --to=mst@redhat.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mwdalton@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).