public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Guixin Liu <kanie@linux.alibaba.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	bhelgaas@google.com, linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: optimize proc sequential file read
Date: Tue, 22 Oct 2024 10:21:46 +0800	[thread overview]
Message-ID: <ece5af0f-4512-4119-90d8-faa3af5bb4ef@linux.alibaba.com> (raw)
In-Reply-To: <2024102148-helium-elk-b100@gregkh>


在 2024/10/21 19:04, Greg KH 写道:
> On Mon, Oct 21, 2024 at 10:04:03AM +0800, Guixin Liu wrote:
>>> This proc interface feels inherently racy.  We keep track of the
>>> current item (n) in a struct seq_file, but I don't think there's
>>> anything to prevent a pci_dev hot-add or -remove between calls to
>>> pci_seq_start().
>> Yes, maybe lost some information this time.
>>> Is the proc interface the only place to get this information?  If
>>> there's a way to get it from sysfs, maybe that is better and we don't
>>> need to spend effort optimizing the less-desirable path?
>> This is the situation I encountered: in scenarios of rapid container
>>
>> scaling, when a container is started, it executes lscpu to traverse
>>
>> the /proc/bus/pci/devices file, or the container process directly
>>
>> traverses this file. When many containers are being started at once,
>>
>> it causes numerous containers to wait due to the locks on the klist
>>
>> used for traversing pci_dev, which greatly reduces the efficiency of
>>
>> container scaling and also causes other CPUs to become unresponsive.
>>
>>
>> User-space programs, including Docker, are clients that we cannot easily
>> modify.
>>
>> Therefore, I attempted to accelerate pci_seq_start() within the kernel.
>>
>> This indeed resulted in the need for more code to maintain, as we must
>>
>> ensure both fast access and ordering. Initially, I considered directly
>>
>> modifying the klist in the driver base module, but such changes would
>>
>> impact other drivers as well.
> I am not opposed to any driver core changes where we iterate over lists
> of objects on a bus, but usually that's not a real "hot path" that
> matters.  Also, you need to keep the lists in a specific order, which I
> do not think an xarray will do very well (or at least not the last time
> I looked at it, I could be wrong.)
>
> I understand your need to want to make 'lspci' faster, but by default,
> 'lspci' does not access the proc files, I thought it went through sysfs.
> Why not just fix the userspace tool instead if that's the real issue?
>
> Just because you can modify the kernel, doesn't mean you should, often
> the better solution is to fix userspace from doing something dumb, and
> if it is doing something dumb, then let it and have it deal with the
> consequences :)
>
> thanks,
>
> greg k-h

Actually, lscpu may access PCI proc files and could potentially iterate 
three times.

You can see this in the lscpu source code within the function

lscpu_read_virtualization()->has_pci_device(). We've previously 
optimized it to

combine those three iterations into one, as reflected in this PR:

https://github.com/util-linux/util-linux/pull/3177.


Additionally, many of our clients’ tools and programs also access PCI proc

files, but we are not allowed to modify that part; they do not provide us

with the source code. Clients tend to perceive that the kernel performance

we provide is subpar(That's why we have stopped giving 'lscpu' direct 
access to sysfs).

We have already implemented such optimizations in our operating system and

achieved significant performance benefits. If this patch can be accepted 
into

the community, it would be great for more people to benefit from it.

Best Regards,

Guixin Liu



  reply	other threads:[~2024-10-22  2:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-18  5:47 [PATCH] PCI: optimize proc sequential file read Guixin Liu
2024-10-18 22:22 ` Bjorn Helgaas
2024-10-21  2:04   ` Guixin Liu
2024-10-21 11:04     ` Greg KH
2024-10-22  2:21       ` Guixin Liu [this message]
2024-10-19  6:39 ` kernel test robot
2024-10-19 18:41 ` kernel test robot
2024-10-20  7:04 ` kernel test robot
2024-10-21  7:17 ` Dan Carpenter
2024-10-22  1:54   ` Guixin Liu
2024-10-22 15:44 ` Bjorn Helgaas
2024-10-24  3:42   ` Guixin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ece5af0f-4512-4119-90d8-faa3af5bb4ef@linux.alibaba.com \
    --to=kanie@linux.alibaba.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox