From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Michael Dalton <mwdalton@google.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
Date: Tue, 19 Nov 2013 23:53:12 +0200 [thread overview]
Message-ID: <20131119215312.GE15004@redhat.com> (raw)
In-Reply-To: <1384896996.8604.120.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > > > We need to drop the refcnt of page when we fail to allocate an skb for frag
> > > > list, otherwise it will be leaked. The bug was introduced by commit
> > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> > > > buffers to page frag allocators").
> > > >
> > > > Cc: Michael Dalton <mwdalton@google.com>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > The patch was needed for 3.12 stable.
> > >
> > > Good catch, but if we return from receive_mergeable() in the 'middle'
> > > of the frags we would need for the current skb, who will
> > > call the virtqueue_get_buf() to flush the remaining frags ?
> > >
> > > Don't we also need to call virtqueue_get_buf() like
> > >
> > > while (--num_buf) {
> > > buf = virtqueue_get_buf(rq->vq, &len);
> > > if (!buf)
> > > break;
> > > put_page(virt_to_head_page(buf));
> > > }
> > >
> > > ?
> > >
> > >
> >
> >
> > Let me explain what worries me in your suggestion:
> >
> > struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
> > if (unlikely(!nskb)) {
> > head_skb->dev->stats.rx_dropped++;
> > return -ENOMEM;
> > }
> >
> > is this the failure case we are talking about?
>
> I thought Jason patch was about this, no ?
>
> >
> > I think this is a symprom of a larger problem
> > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> > namely that we now need to allocate memory in the
> > middle of processing a packet.
> >
> >
> > I think discarding a completely valid and well-formed
> > packet from the receive queue because we are unable
> > to allocate new memory with GFP_ATOMIC
> > for future packets is not a good idea.
>
> How is it different with NIC processing in RX path ?
Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.
> >
> > It certainly violates the principle of least surprize:
> > when one sees host pass packet to guest, one expects
> > the packet to get into the networking stack, not get
> > dropped by the driver internally.
> > Guest stack can do with the packet what it sees fit.
> >
> > We actually wake up a thread if we can't fill up the queue,
> > that will fill it up in GFP_KERNEL context.
> >
> > So I think we should find a way to pre-allocate if necessary and avoid
> > error paths where allocating new memory is a required to avoid drops.
> >
>
> Really, under ATOMIC context, there is no way you can avoid dropping
> packets if you cannot allocate memory. If you cannot allocate sk_buff
> (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
> payload of next packets anyway.
that's why we do:
if (!try_fill_recv(rq, GFP_ATOMIC))
schedule_delayed_work(&vi->refill, 0);
the queues are large enough for a single failure not to be
an immediate problem.
> Same problem on a real NIC.
>
> Under memory pressure we _do_ packet drops.
> Nobody really complained.
>
> Sure, you can add yet another cache of pre-allocated skbs and pay the
> price of managing yet another cache layer, but still need to trop
> packets under stress.
We don't need a cache even. Just enough to avoid dropping packets
if allocation failed in the middle so we don't dequeue a buffer and then
drop it.
Once we use this reserved skb, we stop processing the queue until
refill gives it back.
> Pre-allocating skb on real NIC has a performance cost, because we clear
> sk_buff way ahead of time. By the time skb is finally received, cpu has
> to bring back into its cache memory cache lines.
>
Alternatively we can pre-allocate the memory but avoid clearing it maybe?
--
MST
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>,
rusty@rustcorp.com.au, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Michael Dalton <mwdalton@google.com>,
Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
Date: Tue, 19 Nov 2013 23:53:12 +0200 [thread overview]
Message-ID: <20131119215312.GE15004@redhat.com> (raw)
In-Reply-To: <1384896996.8604.120.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
> > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
> > > > We need to drop the refcnt of page when we fail to allocate an skb for frag
> > > > list, otherwise it will be leaked. The bug was introduced by commit
> > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx
> > > > buffers to page frag allocators").
> > > >
> > > > Cc: Michael Dalton <mwdalton@google.com>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > The patch was needed for 3.12 stable.
> > >
> > > Good catch, but if we return from receive_mergeable() in the 'middle'
> > > of the frags we would need for the current skb, who will
> > > call the virtqueue_get_buf() to flush the remaining frags ?
> > >
> > > Don't we also need to call virtqueue_get_buf() like
> > >
> > > while (--num_buf) {
> > > buf = virtqueue_get_buf(rq->vq, &len);
> > > if (!buf)
> > > break;
> > > put_page(virt_to_head_page(buf));
> > > }
> > >
> > > ?
> > >
> > >
> >
> >
> > Let me explain what worries me in your suggestion:
> >
> > struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
> > if (unlikely(!nskb)) {
> > head_skb->dev->stats.rx_dropped++;
> > return -ENOMEM;
> > }
> >
> > is this the failure case we are talking about?
>
> I thought Jason patch was about this, no ?
>
> >
> > I think this is a symprom of a larger problem
> > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
> > namely that we now need to allocate memory in the
> > middle of processing a packet.
> >
> >
> > I think discarding a completely valid and well-formed
> > packet from the receive queue because we are unable
> > to allocate new memory with GFP_ATOMIC
> > for future packets is not a good idea.
>
> How is it different with NIC processing in RX path ?
Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.
> >
> > It certainly violates the principle of least surprize:
> > when one sees host pass packet to guest, one expects
> > the packet to get into the networking stack, not get
> > dropped by the driver internally.
> > Guest stack can do with the packet what it sees fit.
> >
> > We actually wake up a thread if we can't fill up the queue,
> > that will fill it up in GFP_KERNEL context.
> >
> > So I think we should find a way to pre-allocate if necessary and avoid
> > error paths where allocating new memory is a required to avoid drops.
> >
>
> Really, under ATOMIC context, there is no way you can avoid dropping
> packets if you cannot allocate memory. If you cannot allocate sk_buff
> (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
> payload of next packets anyway.
that's why we do:
if (!try_fill_recv(rq, GFP_ATOMIC))
schedule_delayed_work(&vi->refill, 0);
the queues are large enough for a single failure not to be
an immediate problem.
> Same problem on a real NIC.
>
> Under memory pressure we _do_ packet drops.
> Nobody really complained.
>
> Sure, you can add yet another cache of pre-allocated skbs and pay the
> price of managing yet another cache layer, but still need to trop
> packets under stress.
We don't need a cache even. Just enough to avoid dropping packets
if allocation failed in the middle so we don't dequeue a buffer and then
drop it.
Once we use this reserved skb, we stop processing the queue until
refill gives it back.
> Pre-allocating skb on real NIC has a performance cost, because we clear
> sk_buff way ahead of time. By the time skb is finally received, cpu has
> to bring back into its cache memory cache lines.
>
Alternatively we can pre-allocate the memory but avoid clearing it maybe?
--
MST
next prev parent reply other threads:[~2013-11-19 21:53 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-19 8:05 [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb Jason Wang
2013-11-19 8:05 ` Jason Wang
2013-11-19 14:03 ` Eric Dumazet
2013-11-19 14:03 ` Eric Dumazet
2013-11-19 18:44 ` Michael S. Tsirkin
2013-11-19 18:44 ` Michael S. Tsirkin
2013-11-19 20:49 ` Michael S. Tsirkin
2013-11-19 20:49 ` Michael S. Tsirkin
2013-11-19 21:36 ` Eric Dumazet
2013-11-19 21:36 ` Eric Dumazet
2013-11-19 21:53 ` Michael S. Tsirkin [this message]
2013-11-19 21:53 ` Michael S. Tsirkin
2013-11-19 22:00 ` Eric Dumazet
2013-11-19 22:00 ` Eric Dumazet
2013-11-20 1:34 ` Michael Dalton
2013-11-20 1:34 ` Michael Dalton
2013-11-20 3:17 ` Jason Wang
2013-11-20 3:17 ` Jason Wang
2013-11-20 9:00 ` Michael S. Tsirkin
2013-11-20 9:00 ` Michael S. Tsirkin
2013-11-20 8:58 ` Michael S. Tsirkin
2013-11-20 8:58 ` Michael S. Tsirkin
2013-11-20 15:16 ` Eric Dumazet
2013-11-20 15:16 ` Eric Dumazet
2013-11-20 16:06 ` Michael S. Tsirkin
2013-11-20 16:06 ` Michael S. Tsirkin
2013-11-20 16:14 ` Eric Dumazet
2013-11-20 16:14 ` Eric Dumazet
2013-11-20 17:03 ` Michael S. Tsirkin
2013-11-20 17:03 ` Michael S. Tsirkin
2013-11-19 21:38 ` Michael Dalton
2013-11-19 21:38 ` Michael Dalton
2013-11-20 9:06 ` Michael S. Tsirkin
2013-11-20 9:06 ` Michael S. Tsirkin
2013-11-20 3:05 ` Jason Wang
2013-11-20 3:05 ` Jason Wang
2013-11-20 3:00 ` Jason Wang
2013-11-20 3:00 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131119215312.GE15004@redhat.com \
--to=mst@redhat.com \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mwdalton@google.com \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.