* Re: [PATCH] generic/484: Need another process to check record locks
From: Xiong Murphy Zhou @ 2018-05-22 2:36 UTC (permalink / raw)
To: Xiao Yang; +Cc: fstests, guaneryu, xzhou, jlayton
In-Reply-To: <1526881320-10983-1-git-send-email-yangx.jy@cn.fujitsu.com>
On Mon, May 21, 2018 at 01:42:00PM +0800, Xiao Yang wrote:
> 1) According to fcntl(2) manpage, A single process always gets F_UNLCK in
> the l_type field when using fcntl(F_GETLK) to acquire the existing lock
> set by itself because it could convert the existing lock to a new lock
> unconditionally. So we need another process to check if the lock exists.
I used to do that, eventaully I deleted it because we don't need
to check in another process in this case.
Thanks,
Xiong
>
> 2) Remove redundant exit(0).
>
> Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
> ---
> src/t_locks_execve.c | 24 ++++++++++++++++++------
> 1 file changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/src/t_locks_execve.c b/src/t_locks_execve.c
> index 9ad2dc3..d99d7de 100644
> --- a/src/t_locks_execve.c
> +++ b/src/t_locks_execve.c
> @@ -8,6 +8,7 @@
> #include <errno.h>
> #include <pthread.h>
> #include <unistd.h>
> +#include <sys/types.h>
> #include <sys/wait.h>
>
> static void err_exit(char *op, int errn)
> @@ -32,12 +33,24 @@ struct flock fl = {
>
> static void checklock(int fd)
> {
> - if (fcntl(fd, F_GETLK, &fl) < 0)
> - err_exit("getlk", errno);
> - if (fl.l_type == F_UNLCK) {
> - printf("record lock is not preserved across execve(2)\n");
> - exit(1);
> + pid_t pid;
> +
> + pid = fork();
> + if (pid < 0)
> + err_exit("fork", errno);
> +
> + if (!pid) {
> + if (fcntl(fd, F_GETLK, &fl) < 0)
> + err_exit("getlk", errno);
> + if (fl.l_type == F_UNLCK) {
> + printf("record lock is not preserved across execve(2)\n");
> + exit(1);
> + }
> + exit(0);
> }
> +
> + waitpid(pid, NULL, 0);
> +
> exit(0);
> }
>
> @@ -52,7 +65,6 @@ int main(int argc, char **argv)
> if (argc == 3) {
> fd = atoi(argv[2]);
> checklock(fd);
> - exit(0);
> }
>
> fd = open(argv[1], O_WRONLY|O_CREAT, 0755);
> --
> 1.8.3.1
>
>
>
^ permalink raw reply
* RE: [PATCH V2] gpio: mxc: add clock operation
From: Anson Huang @ 2018-05-22 2:35 UTC (permalink / raw)
To: Fabio Estevam
Cc: Linus Walleij, dl-linux-imx, linux-gpio@vger.kernel.org,
linux-kernel
In-Reply-To: <CAOMZO5Btvr7R7Ah_=LT6yDxRkJS=bi0bHVFN9ZHi=HiKeQP4sw@mail.gmail.com>
Anson Huang
Best Regards!
> -----Original Message-----
> From: Fabio Estevam [mailto:festevam@gmail.com]
> Sent: Tuesday, May 22, 2018 10:34 AM
> To: Anson Huang <anson.huang@nxp.com>
> Cc: Linus Walleij <linus.walleij@linaro.org>; dl-linux-imx <linux-imx@nxp.com>;
> linux-gpio@vger.kernel.org; linux-kernel <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH V2] gpio: mxc: add clock operation
>
> Hi Anson,
>
> On Mon, May 21, 2018 at 11:29 PM, Anson Huang <anson.huang@nxp.com>
> wrote:
>
> > Thanks, I will rework it into 2 patches, using SPDX.
>
> I have just sent a series that converts gpio-mxc and gpio-mxs to use SPDX
> identifier and put you on Cc.
>
> You can base your change on top of mine.
>
> Thanks
OK, thanks.
^ permalink raw reply
* Re: [PATCH V2] gpio: mxc: add clock operation
From: Fabio Estevam @ 2018-05-22 2:34 UTC (permalink / raw)
To: Anson Huang
Cc: Linus Walleij, dl-linux-imx, linux-gpio@vger.kernel.org,
linux-kernel
In-Reply-To: <AM3PR04MB13154C4D2EF4D42A16BF19E6F5940@AM3PR04MB1315.eurprd04.prod.outlook.com>
Hi Anson,
On Mon, May 21, 2018 at 11:29 PM, Anson Huang <anson.huang@nxp.com> wrote:
> Thanks, I will rework it into 2 patches, using SPDX.
I have just sent a series that converts gpio-mxc and gpio-mxs to use
SPDX identifier and put you on Cc.
You can base your change on top of mine.
Thanks
^ permalink raw reply
* Re: [meta-networking][PATCH] postgresql: remove *_config from SSTATE_SCAN_FILES
From: Kang Kai @ 2018-05-22 2:32 UTC (permalink / raw)
To: openembedded-devel
In-Reply-To: <20180522015922.21057-1-kai.kang@windriver.com>
Please ignore this one with wrong title prefix layer meta-networking.
--Kai
On 2018年05月22日 09:59, kai.kang@windriver.com wrote:
> From: Kai Kang <kai.kang@windriver.com>
>
> It fails to run command pg_config with segment fault. The root cause is
> function sstate_hardcode_path takes elf file pg_config as a configure
> file and edits it with 'sed'.
>
> And then file pg_config is corrupt:
> $ readelf -a package/usr/bin/pg_config >/dev/null
> readelf: Error: Unable to read in 0x700 bytes of section headers
> readelf: Error: Section headers are not available!
>
> There is not other '*_config' file installed by postgresql except
> pg_config, so remove '*_config' from SSTATE_SCAN_FILES for postgresql.
>
> Signed-off-by: Kai Kang <kai.kang@windriver.com>
> ---
> meta-oe/recipes-dbs/postgresql/postgresql.inc | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/meta-oe/recipes-dbs/postgresql/postgresql.inc b/meta-oe/recipes-dbs/postgresql/postgresql.inc
> index 5462332c5..1301060ee 100644
> --- a/meta-oe/recipes-dbs/postgresql/postgresql.inc
> +++ b/meta-oe/recipes-dbs/postgresql/postgresql.inc
> @@ -202,6 +202,7 @@ do_install_append() {
> }
>
> SSTATE_SCAN_FILES += "Makefile.global"
> +SSTATE_SCAN_FILES_remove = "*_config"
>
> PACKAGES =+ "${PN}-client ${PN}-server-dev ${PN}-timezone \
> libecpg-compat-dbg libecpg-compat libecpg-compat-dev \
--
Regards,
Neil | Kai Kang
^ permalink raw reply
* Re: [Fuego] [PATCH] core: add log_this function
From: Daniel Sangorrin @ 2018-05-22 2:31 UTC (permalink / raw)
To: Tim.Bird, fuego
In-Reply-To: <003401d3f173$880701d0$98150570$@toshiba.co.jp>
> -----Original Message-----
> From: fuego-bounces@lists.linuxfoundation.org
> [mailto:fuego-bounces@lists.linuxfoundation.org] On Behalf Of Daniel Sangorrin
> Sent: Tuesday, May 22, 2018 11:21 AM
> To: Tim.Bird@sony.com; fuego@lists.linuxfoundation.org
> Subject: Re: [Fuego] [PATCH] core: add log_this function
>
> Hi Tim,
>
> I noticed that you are appending the log to testlog. It would be nice to add a
> separator such as:
> Fuego: board A testlog
> ...
> Fuego: host testlog
> ....x
>
> Also I was thinking that from a systems perspective, it would be nice to be able to
> rename the log files.
>
> For example, suppose that we have a test that requires 4 boards (A, B, C, D) and the
> test is coordinated by board E (it could be "docker" or the host as well).
>
> Initially, we would run ftc run-test on board E (the coordinator). Then board-E's
> fuego_test.sh would execute
> ftc run-test -b boardname --log boardname.log -t ...
> on boards [A..D].
> # it would be nice to have a new category (like Benchmark and Functional) for
> monitoring tests that just check the disk usage or network status. Something like
> MONITORING.free (check memory usage) etc.
>
> Finally, during the post processing phase board E can merge the logs into one (using
> a separator) and give it to the parser.
Also, board-E should merge the JSON results for each subtest (e.g. each ftc run-test executed).
Probably the best idea would be to use the global result of the test (instead of the results of each test set or test case) and add it the board-E's JSON file as a test case result. And maybe add a link to the detailed JSON file produced by each board.
>
> # Of course Board E should be able to use Fuego core functions for switching on the
> boards, waiting until all of them are ready etc.
>
> Thanks,
> Daniel
>
>
>
>
>
>
>
>
> > -----Original Message-----
> > From: fuego-bounces@lists.linuxfoundation.org
> > [mailto:fuego-bounces@lists.linuxfoundation.org] On Behalf Of
> > Tim.Bird@sony.com
> > Sent: Tuesday, May 22, 2018 3:40 AM
> > To: fuego@lists.linuxfoundation.org
> > Subject: [Fuego] [PATCH] core: add log_this function
> >
> > Hey Fuego-ans,
> >
> > Here is a patch that I applied to fuego-core last week. I've been doing
> > some thinking about some longstanding issues with tests that have a
> > host-side component to their data gathering. Based on this, and recent
> > discussions on the list, I implemented a new "log_this" function.
> > It's like the "report" function, but for a host-side command.
> >
> > I believe this will be a new, important architectural feature of Fuego.
> >
> > This is part of a broader effort to expand the scope of Fuego testing, from just
> > target-side testing, to more system-wide testing. It's clear that for some types of
> > hardware testing, additional off-DUT frameworks will need to be accessed,
> > and in some cases controlled. This new function "log_this" is the start of
> > support for logging the access to such non-DUT frameworks
> > (facilities, devices, harnesses, resources, etc.)
> >
> > I'm also thinking about what's needed to provide for generalized control
> > of such things. This is a tricky subject, due to the incredible fragmentation
> > there is in board control hardware, secondary resource control, and associated
> > driving software.
> > However, I'm considering implementing some kind of generic resource
> > reservation and management system (over the long run - this is not the highest
> > priority at the moment).
> >
> > In any event, here's the patch for this little bit, which is actually pretty simple...
> > --------------
> > Some tests need to get information and data from host-side
> > operations, that needs to be reported and analyzed by Fuego.
> >
> > The log_this function captures the output of commands executed
> > on the host, and puts it (ultimately) into the test log for a run.
> > Any command executed with "log_this" is saved during test execution,
> > and placed in the final testlog.txt, after any
> > board-side log data (from report and report_append) commands.
> >
> > There are several tests (especially Fuego self-tests) that could
> > use this feature, to avoid an awkward sequence of push-to-target,
> > and report-cat, to get log data from the host into the testlog.
> >
> > Signed-off-by: Tim Bird <tim.bird@sony.com>
> > ---
> > engine/scripts/functions.sh | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/engine/scripts/functions.sh b/engine/scripts/functions.sh
> > index 0b293db..8fabd85 100755
> > --- a/engine/scripts/functions.sh
> > +++ b/engine/scripts/functions.sh
> > @@ -226,6 +226,21 @@ function report_append {
> > return ${RESULT}
> > }
> >
> > +# $1 - local shell command
> > +function log_this {
> > + is_empty $1
> > +
> > + RETCODE=/tmp/$$-${RANDOM}
> > + touch $RETCODE
> > +
> > + { $1; echo $? > $RETCODE ; } 2>&1 | tee -a ${LOGDIR}/hostlog.txt
> > +
> > + RESULT=$(cat $RETCODE)
> > + rm -f $RETCODE
> > + export REPORT_RETURN_VALUE=${RESULT}
> > + return ${RESULT}
> > +}
> > +
> > function dump_syslogs {
> > # 1 - tmp dir, 2 - before/after
> >
> > @@ -466,6 +481,10 @@ function fetch_results {
> > get $BOARD_TESTDIR/fuego.$TESTDIR/$TESTDIR.log
> ${LOGDIR}/testlog.txt
> > || \
> > echo "INFO: the test did not produce a test log on the target" | tee
> > ${LOGDIR}/testlog.txt
> >
> > + if [ -f ${LOGDIR}/hostlog.txt ] ; then
> > + cat ${LOGDIR}/hostlog.txt >> ${LOGDIR}/testlog.txt
> > + fi
> > +
> > # Get syslogs
> > dump_syslogs ${fuego_test_tmp} "after"
> > get
> > ${fuego_test_tmp}/${NODE_NAME}.${BUILD_ID}.${BUILD_NUMBER}.before
> > ${LOGDIR}/syslog.before.txt
> > --
> > 2.1.4
> >
> > _______________________________________________
> > Fuego mailing list
> > Fuego@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/fuego
>
>
>
> _______________________________________________
> Fuego mailing list
> Fuego@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/fuego
^ permalink raw reply
* [meta-oe][PATCH] postgresql: remove *_config from SSTATE_SCAN_FILES
From: kai.kang @ 2018-05-22 2:00 UTC (permalink / raw)
To: openembedded-devel
From: Kai Kang <kai.kang@windriver.com>
It fails to run command pg_config with segment fault. The root cause is
function sstate_hardcode_path takes elf file pg_config as a configure
file and edits it with 'sed'.
And then file pg_config is corrupt:
$ readelf -a package/usr/bin/pg_config >/dev/null
readelf: Error: Unable to read in 0x700 bytes of section headers
readelf: Error: Section headers are not available!
There is not other '*_config' file installed by postgresql except
pg_config, so remove '*_config' from SSTATE_SCAN_FILES for postgresql.
Signed-off-by: Kai Kang <kai.kang@windriver.com>
---
meta-oe/recipes-dbs/postgresql/postgresql.inc | 1 +
1 file changed, 1 insertion(+)
diff --git a/meta-oe/recipes-dbs/postgresql/postgresql.inc b/meta-oe/recipes-dbs/postgresql/postgresql.inc
index 5462332c5..1301060ee 100644
--- a/meta-oe/recipes-dbs/postgresql/postgresql.inc
+++ b/meta-oe/recipes-dbs/postgresql/postgresql.inc
@@ -202,6 +202,7 @@ do_install_append() {
}
SSTATE_SCAN_FILES += "Makefile.global"
+SSTATE_SCAN_FILES_remove = "*_config"
PACKAGES =+ "${PN}-client ${PN}-server-dev ${PN}-timezone \
libecpg-compat-dbg libecpg-compat libecpg-compat-dev \
--
2.14.1
^ permalink raw reply related
* [meta-networking][PATCH] postgresql: remove *_config from SSTATE_SCAN_FILES
From: kai.kang @ 2018-05-22 1:59 UTC (permalink / raw)
To: openembedded-devel
From: Kai Kang <kai.kang@windriver.com>
It fails to run command pg_config with segment fault. The root cause is
function sstate_hardcode_path takes elf file pg_config as a configure
file and edits it with 'sed'.
And then file pg_config is corrupt:
$ readelf -a package/usr/bin/pg_config >/dev/null
readelf: Error: Unable to read in 0x700 bytes of section headers
readelf: Error: Section headers are not available!
There is not other '*_config' file installed by postgresql except
pg_config, so remove '*_config' from SSTATE_SCAN_FILES for postgresql.
Signed-off-by: Kai Kang <kai.kang@windriver.com>
---
meta-oe/recipes-dbs/postgresql/postgresql.inc | 1 +
1 file changed, 1 insertion(+)
diff --git a/meta-oe/recipes-dbs/postgresql/postgresql.inc b/meta-oe/recipes-dbs/postgresql/postgresql.inc
index 5462332c5..1301060ee 100644
--- a/meta-oe/recipes-dbs/postgresql/postgresql.inc
+++ b/meta-oe/recipes-dbs/postgresql/postgresql.inc
@@ -202,6 +202,7 @@ do_install_append() {
}
SSTATE_SCAN_FILES += "Makefile.global"
+SSTATE_SCAN_FILES_remove = "*_config"
PACKAGES =+ "${PN}-client ${PN}-server-dev ${PN}-timezone \
libecpg-compat-dbg libecpg-compat libecpg-compat-dev \
--
2.14.1
^ permalink raw reply related
* RE: [PATCH V2] gpio: mxc: add clock operation
From: Anson Huang @ 2018-05-22 2:29 UTC (permalink / raw)
To: Fabio Estevam
Cc: Linus Walleij, dl-linux-imx, linux-gpio@vger.kernel.org,
linux-kernel
In-Reply-To: <CAOMZO5BJy-jEKywyN515+5W-bmA01v7sWsgDj0=5wtBCqsHSYQ@mail.gmail.com>
Hi, Fabio
Anson Huang
Best Regards!
> -----Original Message-----
> From: Fabio Estevam [mailto:festevam@gmail.com]
> Sent: Tuesday, May 22, 2018 10:28 AM
> To: Anson Huang <anson.huang@nxp.com>
> Cc: Linus Walleij <linus.walleij@linaro.org>; dl-linux-imx <linux-imx@nxp.com>;
> linux-gpio@vger.kernel.org; linux-kernel <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH V2] gpio: mxc: add clock operation
>
> Hi Anson,
>
> On Mon, May 21, 2018 at 10:15 PM, Anson Huang <Anson.Huang@nxp.com>
> wrote:
> > Some i.MX SoCs have GPIO clock gates in CCM CCGR, such as i.MX6SLL,
> > need to enable clocks before accessing GPIO registers, add optional
> > clock operation for GPIO driver.
> >
> > Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
> > ---
> > changes since V1:
> > add missing clk header;
> > remove FSF addresses in copyright to avoid check patch ERROR.
> > drivers/gpio/gpio-mxc.c | 18 ++++++++++++++----
> > 1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpio/gpio-mxc.c b/drivers/gpio/gpio-mxc.c index
> > 11ec722..2026f94 100644
> > --- a/drivers/gpio/gpio-mxc.c
> > +++ b/drivers/gpio/gpio-mxc.c
> > @@ -14,12 +14,9 @@
> > * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > * GNU General Public License for more details.
> > - *
> > - * You should have received a copy of the GNU General Public License
> > - * along with this program; if not, write to the Free Software
> > - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
> USA.
> > */
>
> This is an unrelated change and should be part of a different patch.
>
> A patch that converts this driver to use SPDX would get rid of the FSF address.
Thanks, I will rework it into 2 patches, using SPDX.
Anson.
^ permalink raw reply
* [RFC PATCH 2/3] blk-mq: Fix timeout and state order
From: Ming Lei @ 2018-05-22 2:28 UTC (permalink / raw)
In-Reply-To: <20180521231131.6685-3-keith.busch@intel.com>
On Mon, May 21, 2018@05:11:30PM -0600, Keith Busch wrote:
> The block layer had been setting the state to in-flight prior to updating
> the timer. This is the wrong order since the timeout handler could observe
> the in-flight state with the older timeout, believing the request had
> expired when in fact it is just getting started.
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
> block/blk-mq.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8b370ed75605..66e5c768803f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -713,8 +713,8 @@ void blk_mq_start_request(struct request *rq)
> preempt_disable();
> write_seqcount_begin(&rq->gstate_seq);
>
> - blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
> blk_add_timer(rq);
> + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
>
> write_seqcount_end(&rq->gstate_seq);
> preempt_enable();
> --
> 2.14.3
>
Looks fine,
Reviewed-by: Ming Lei <ming.lei at redhat.com>
Thanks,
Ming
^ permalink raw reply
* Re: [RFC PATCH 2/3] blk-mq: Fix timeout and state order
From: Ming Lei @ 2018-05-22 2:28 UTC (permalink / raw)
To: Keith Busch
Cc: Jens Axboe, linux-nvme, linux-block, Christoph Hellwig,
Bart Van Assche
In-Reply-To: <20180521231131.6685-3-keith.busch@intel.com>
On Mon, May 21, 2018 at 05:11:30PM -0600, Keith Busch wrote:
> The block layer had been setting the state to in-flight prior to updating
> the timer. This is the wrong order since the timeout handler could observe
> the in-flight state with the older timeout, believing the request had
> expired when in fact it is just getting started.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> block/blk-mq.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8b370ed75605..66e5c768803f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -713,8 +713,8 @@ void blk_mq_start_request(struct request *rq)
> preempt_disable();
> write_seqcount_begin(&rq->gstate_seq);
>
> - blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
> blk_add_timer(rq);
> + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
>
> write_seqcount_end(&rq->gstate_seq);
> preempt_enable();
> --
> 2.14.3
>
Looks fine,
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Thanks,
Ming
^ permalink raw reply
* Re: [PATCH V2] gpio: mxc: add clock operation
From: Fabio Estevam @ 2018-05-22 2:27 UTC (permalink / raw)
To: Anson Huang; +Cc: Linus Walleij, NXP Linux Team, linux-gpio, linux-kernel
In-Reply-To: <1526951717-12347-1-git-send-email-Anson.Huang@nxp.com>
Hi Anson,
On Mon, May 21, 2018 at 10:15 PM, Anson Huang <Anson.Huang@nxp.com> wrote:
> Some i.MX SoCs have GPIO clock gates in CCM CCGR, such as
> i.MX6SLL, need to enable clocks before accessing GPIO
> registers, add optional clock operation for GPIO driver.
>
> Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
> ---
> changes since V1:
> add missing clk header;
> remove FSF addresses in copyright to avoid check patch ERROR.
> drivers/gpio/gpio-mxc.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpio/gpio-mxc.c b/drivers/gpio/gpio-mxc.c
> index 11ec722..2026f94 100644
> --- a/drivers/gpio/gpio-mxc.c
> +++ b/drivers/gpio/gpio-mxc.c
> @@ -14,12 +14,9 @@
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> */
This is an unrelated change and should be part of a different patch.
A patch that converts this driver to use SPDX would get rid of the FSF address.
^ permalink raw reply
* [RFC PATCH 1/3] blk-mq: Reference count request usage
From: Ming Lei @ 2018-05-22 2:27 UTC (permalink / raw)
In-Reply-To: <20180521231131.6685-2-keith.busch@intel.com>
On Mon, May 21, 2018@05:11:29PM -0600, Keith Busch wrote:
> This patch adds a struct kref to struct request so that request users
> can be sure they're operating on the same request without it changing
> while they're processing it. The request's tag won't be released for
> reuse until the last user is done with it.
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
> block/blk-mq.c | 30 +++++++++++++++++++++++-------
> include/linux/blkdev.h | 2 ++
> 2 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4cbfd784e837..8b370ed75605 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -332,6 +332,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
> #endif
>
> data->ctx->rq_dispatched[op_is_sync(op)]++;
> + kref_init(&rq->ref);
> return rq;
> }
>
> @@ -465,13 +466,33 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> }
> EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);
>
> +static void blk_mq_exit_request(struct kref *ref)
> +{
> + struct request *rq = container_of(ref, struct request, ref);
> + struct request_queue *q = rq->q;
> + struct blk_mq_ctx *ctx = rq->mq_ctx;
> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
> + const int sched_tag = rq->internal_tag;
> +
> + if (rq->tag != -1)
> + blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
> + if (sched_tag != -1)
> + blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
> + blk_mq_sched_restart(hctx);
> + blk_queue_exit(q);
> +}
> +
> +static void blk_mq_put_request(struct request *rq)
> +{
> + kref_put(&rq->ref, blk_mq_exit_request);
> +}
> +
> void blk_mq_free_request(struct request *rq)
> {
> struct request_queue *q = rq->q;
> struct elevator_queue *e = q->elevator;
> struct blk_mq_ctx *ctx = rq->mq_ctx;
> struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
> - const int sched_tag = rq->internal_tag;
>
> if (rq->rq_flags & RQF_ELVPRIV) {
> if (e && e->type->ops.mq.finish_request)
> @@ -495,12 +516,7 @@ void blk_mq_free_request(struct request *rq)
> blk_put_rl(blk_rq_rl(rq));
>
> blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
> - if (rq->tag != -1)
> - blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
> - if (sched_tag != -1)
> - blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
> - blk_mq_sched_restart(hctx);
> - blk_queue_exit(q);
> + blk_mq_put_request(rq);
Both the above line(atomic_try_cmpxchg_release is implied) and kref_init()
in blk_mq_rq_ctx_init() are run from fast path, and may introduce some cost,
you may have to run some benchmark to show if there is effect.
Also given the cost isn't free, could you describe a bit in comment log
what we can get with the cost?
Thanks,
Ming
^ permalink raw reply
* Re: [RFC PATCH 1/3] blk-mq: Reference count request usage
From: Ming Lei @ 2018-05-22 2:27 UTC (permalink / raw)
To: Keith Busch
Cc: Jens Axboe, linux-nvme, linux-block, Christoph Hellwig,
Bart Van Assche
In-Reply-To: <20180521231131.6685-2-keith.busch@intel.com>
On Mon, May 21, 2018 at 05:11:29PM -0600, Keith Busch wrote:
> This patch adds a struct kref to struct request so that request users
> can be sure they're operating on the same request without it changing
> while they're processing it. The request's tag won't be released for
> reuse until the last user is done with it.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> block/blk-mq.c | 30 +++++++++++++++++++++++-------
> include/linux/blkdev.h | 2 ++
> 2 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4cbfd784e837..8b370ed75605 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -332,6 +332,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
> #endif
>
> data->ctx->rq_dispatched[op_is_sync(op)]++;
> + kref_init(&rq->ref);
> return rq;
> }
>
> @@ -465,13 +466,33 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> }
> EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);
>
> +static void blk_mq_exit_request(struct kref *ref)
> +{
> + struct request *rq = container_of(ref, struct request, ref);
> + struct request_queue *q = rq->q;
> + struct blk_mq_ctx *ctx = rq->mq_ctx;
> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
> + const int sched_tag = rq->internal_tag;
> +
> + if (rq->tag != -1)
> + blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
> + if (sched_tag != -1)
> + blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
> + blk_mq_sched_restart(hctx);
> + blk_queue_exit(q);
> +}
> +
> +static void blk_mq_put_request(struct request *rq)
> +{
> + kref_put(&rq->ref, blk_mq_exit_request);
> +}
> +
> void blk_mq_free_request(struct request *rq)
> {
> struct request_queue *q = rq->q;
> struct elevator_queue *e = q->elevator;
> struct blk_mq_ctx *ctx = rq->mq_ctx;
> struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
> - const int sched_tag = rq->internal_tag;
>
> if (rq->rq_flags & RQF_ELVPRIV) {
> if (e && e->type->ops.mq.finish_request)
> @@ -495,12 +516,7 @@ void blk_mq_free_request(struct request *rq)
> blk_put_rl(blk_rq_rl(rq));
>
> blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
> - if (rq->tag != -1)
> - blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
> - if (sched_tag != -1)
> - blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
> - blk_mq_sched_restart(hctx);
> - blk_queue_exit(q);
> + blk_mq_put_request(rq);
Both the above line(atomic_try_cmpxchg_release is implied) and kref_init()
in blk_mq_rq_ctx_init() are run from fast path, and may introduce some cost,
you may have to run some benchmark to show if there is effect.
Also given the cost isn't free, could you describe a bit in comment log
what we can get with the cost?
Thanks,
Ming
^ permalink raw reply
* Re: [PATCH 4.14 00/95] 4.14.43-stable review
From: kernelci.org bot @ 2018-05-22 2:26 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-kernel
Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
ben.hutchings, lkft-triage, stable
In-Reply-To: <20180521210447.219380974@linuxfoundation.org>
stable-rc/linux-4.14.y boot: 116 boots: 0 failed, 111 passed with 5 offline (v4.14.42-96-gb98076ba9976)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14.42-96-gb98076ba9976/
Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.42-96-gb98076ba9976/
Tree: stable-rc
Branch: linux-4.14.y
Git Describe: v4.14.42-96-gb98076ba9976
Git Commit: b98076ba9976b9ea99d2595ee3af9f94a0c0d22c
Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Tested: 60 unique boards, 23 SoC families, 15 builds out of 185
Offline Platforms:
arm:
tegra_defconfig:
tegra30-beaver: 1 offline lab
multi_v7_defconfig:
armada-xp-openblocks-ax3-4: 1 offline lab
tegra30-beaver: 1 offline lab
zynq-zc702: 1 offline lab
mvebu_v7_defconfig:
armada-xp-openblocks-ax3-4: 1 offline lab
---
For more info write to <info@kernelci.org>
^ permalink raw reply
* Re: [PATCH 0/2] bind update req. new pkg in core
From: Khem Raj @ 2018-05-22 2:25 UTC (permalink / raw)
To: akuster808; +Cc: Patches and discussions about the oe-core layer
In-Reply-To: <8523df99-ce1f-c99e-0379-4b72df9d58ad@gmail.com>
On Mon, May 21, 2018 at 9:30 PM, akuster808 <akuster808@gmail.com> wrote:
>
>
> On 05/21/2018 06:06 PM, Khem Raj wrote:
>
>
> On Mon, May 21, 2018 at 3:44 PM Armin Kuster <akuster808@gmail.com> wrote:
>>
>> With this update, the python-ply package is required.
>> I copied the one from meta-python to core. Once this hits master,
>> I will send a patch to remove the same recipe from
>> meta-python.
>
>
> Can this be turned on/off using packageconfig ?
>
> Yes we can and it is already in the PACKAGECONFIG options.
>
> If we do disable python, then the python binaries that are currently being
> installed will not. Someone decided: dnssec-coverage dnssec-checkds where
> important to install in the current Bind solution. A new one is being
> added; dnssec-keymgr. If we don't want dnssec support, thats easy to
> exclude.
>
these all seem to be additional utilities so probably should be
packaged into a separate package
may be bind-utils
> BTW, pthon-ply exists in 4 other layers besides meta-python and at
> different versions.
thats less than ideal, and probably should be consolidated, moving to
oe-core probably will not
solve that problem
>
> let me know which way you want to go.
I dont have any strong opinion either way is fine.
> - armin
>
>
>
>
>
> If so then it would be ideal
>
> If not then please send the removal patches for meta-python regardless so it
> can be tested together
>>
>>
>>
>> Armin Kuster (2):
>> bind: update to 9.12.1
>> python3-ply: add package needed by bind 9.12 update
>>
>> ...0001-build-use-pkg-config-to-find-libxml2.patch | 13 +++---
>> ...-gen.c-extend-DIRNAMESIZE-from-256-to-512.patch | 13 +++---
>> .../0001-lib-dns-gen.c-fix-too-long-error.patch | 13 +++---
>> .../bind/bind/bind-confgen-build-unix.o-once.patch | 48
>> ----------------------
>> ...-searching-for-json-headers-searches-sysr.patch | 13 +++---
>> .../bind/bind/dont-test-on-host.patch | 17 --------
>> .../use-python3-and-fix-install-lib-path.patch | 36 ----------------
>> .../bind/{bind_9.10.6.bb => bind_9.12.1.bb} | 21 ++++------
>> meta/recipes-devtools/python/python-ply.inc | 18 ++++++++
>> meta/recipes-devtools/python/python-ply_3.11.bb | 2 +
>> 10 files changed, 49 insertions(+), 145 deletions(-)
>> delete mode 100644
>> meta/recipes-connectivity/bind/bind/bind-confgen-build-unix.o-once.patch
>> delete mode 100644
>> meta/recipes-connectivity/bind/bind/dont-test-on-host.patch
>> delete mode 100644
>> meta/recipes-connectivity/bind/bind/use-python3-and-fix-install-lib-path.patch
>> rename meta/recipes-connectivity/bind/{bind_9.10.6.bb => bind_9.12.1.bb}
>> (86%)
>> create mode 100644 meta/recipes-devtools/python/python-ply.inc
>> create mode 100644 meta/recipes-devtools/python/python-ply_3.11.bb
>>
>> --
>> 2.7.4
>>
>> --
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>
>
^ permalink raw reply
* Re: [PATCH 05/15] drm/sun4i: Add TCON TOP driver
From: kbuild test robot @ 2018-05-22 2:25 UTC (permalink / raw)
To: Jernej Skrabec
Cc: kbuild-all, maxime.ripard, wens, robh+dt, mark.rutland, dri-devel,
devicetree, linux-arm-kernel, linux-kernel, linux-clk,
linux-sunxi
In-Reply-To: <20180519183127.2718-6-jernej.skrabec@siol.net>
[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]
Hi Jernej,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on drm/drm-next]
[also build test ERROR on v4.17-rc6 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Jernej-Skrabec/Add-support-for-R40-HDMI-pipeline/20180521-131839
base: git://people.freedesktop.org/~airlied/linux.git drm-next
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
All errors (new ones prefixed by >>):
drivers/gpu/drm/sun4i/sun8i_tcon_top.o: In function `init_module':
>> sun8i_tcon_top.c:(.init.text+0x0): multiple definition of `init_module'
drivers/gpu/drm/sun4i/sun8i_mixer.o:sun8i_mixer.c:(.init.text+0x0): first defined here
drivers/gpu/drm/sun4i/sun8i_tcon_top.o: In function `cleanup_module':
>> sun8i_tcon_top.c:(.exit.text+0x0): multiple definition of `cleanup_module'
drivers/gpu/drm/sun4i/sun8i_mixer.o:sun8i_mixer.c:(.exit.text+0x0): first defined here
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43346 bytes --]
^ permalink raw reply
* Re: [PATCH 05/15] drm/sun4i: Add TCON TOP driver
From: kbuild test robot @ 2018-05-22 2:25 UTC (permalink / raw)
To: Jernej Skrabec
Cc: kbuild-all-JC7UmRfGjtg, maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ,
wens-jdAy2FN1RRM, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
mark.rutland-5wv7dgnIgG8,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-clk-u79uwXL29TY76Z2rM5mHXA,
linux-sunxi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20180519183127.2718-6-jernej.skrabec-gGgVlfcn5nU@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1773 bytes --]
Hi Jernej,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on drm/drm-next]
[also build test ERROR on v4.17-rc6 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Jernej-Skrabec/Add-support-for-R40-HDMI-pipeline/20180521-131839
base: git://people.freedesktop.org/~airlied/linux.git drm-next
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
All errors (new ones prefixed by >>):
drivers/gpu/drm/sun4i/sun8i_tcon_top.o: In function `init_module':
>> sun8i_tcon_top.c:(.init.text+0x0): multiple definition of `init_module'
drivers/gpu/drm/sun4i/sun8i_mixer.o:sun8i_mixer.c:(.init.text+0x0): first defined here
drivers/gpu/drm/sun4i/sun8i_tcon_top.o: In function `cleanup_module':
>> sun8i_tcon_top.c:(.exit.text+0x0): multiple definition of `cleanup_module'
drivers/gpu/drm/sun4i/sun8i_mixer.o:sun8i_mixer.c:(.exit.text+0x0): first defined here
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
--
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43346 bytes --]
^ permalink raw reply
* [PATCH 05/15] drm/sun4i: Add TCON TOP driver
From: kbuild test robot @ 2018-05-22 2:25 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20180519183127.2718-6-jernej.skrabec@siol.net>
Hi Jernej,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on drm/drm-next]
[also build test ERROR on v4.17-rc6 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Jernej-Skrabec/Add-support-for-R40-HDMI-pipeline/20180521-131839
base: git://people.freedesktop.org/~airlied/linux.git drm-next
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
All errors (new ones prefixed by >>):
drivers/gpu/drm/sun4i/sun8i_tcon_top.o: In function `init_module':
>> sun8i_tcon_top.c:(.init.text+0x0): multiple definition of `init_module'
drivers/gpu/drm/sun4i/sun8i_mixer.o:sun8i_mixer.c:(.init.text+0x0): first defined here
drivers/gpu/drm/sun4i/sun8i_tcon_top.o: In function `cleanup_module':
>> sun8i_tcon_top.c:(.exit.text+0x0): multiple definition of `cleanup_module'
drivers/gpu/drm/sun4i/sun8i_mixer.o:sun8i_mixer.c:(.exit.text+0x0): first defined here
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 43346 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180522/9eadc9f2/attachment-0001.gz>
^ permalink raw reply
* [PATCH v3 net-next 1/2] umh: introduce fork_usermode_blob() helper
From: Alexei Starovoitov @ 2018-05-22 2:22 UTC (permalink / raw)
To: David S . Miller
Cc: daniel, torvalds, gregkh, luto, mcgrof, keescook, netdev,
linux-kernel, kernel-team
In-Reply-To: <20180522022230.2492505-1-ast@kernel.org>
Introduce helper:
int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
struct umh_info {
struct file *pipe_to_umh;
struct file *pipe_from_umh;
pid_t pid;
};
that GPLed kernel modules (signed or unsigned) can use it to execute part
of its own data as swappable user mode process.
The kernel will do:
- allocate a unique file in tmpfs
- populate that file with [data, data + len] bytes
- user-mode-helper code will do_execve that file and, before the process
starts, the kernel will create two unix pipes for bidirectional
communication between kernel module and umh
- close tmpfs file, effectively deleting it
- the fork_usermode_blob will return zero on success and populate
'struct umh_info' with two unix pipes and the pid of the user process
As the first step in the development of the bpfilter project
the fork_usermode_blob() helper is introduced to allow user mode code
to be invoked from a kernel module. The idea is that user mode code plus
normal kernel module code are built as part of the kernel build
and installed as traditional kernel module into distro specified location,
such that from a distribution point of view, there is
no difference between regular kernel modules and kernel modules + umh code.
Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
by a kernel module doesn't make it any special from kernel and user space
tooling point of view.
Such approach enables kernel to delegate functionality traditionally done
by the kernel modules into the user space processes (either root or !root) and
reduces security attack surface of the new code. The buggy umh code would crash
the user process, but not the kernel. Another advantage is that umh code
of the kernel module can be debugged and tested out of user space
(e.g. opening the possibility to run clang sanitizers, fuzzers or
user space test suites on the umh code).
In case of the bpfilter project such architecture allows complex control plane
to be done in the user space while bpf based data plane stays in the kernel.
Since umh can crash, can be oom-ed by the kernel, killed by the admin,
the kernel module that uses them (like bpfilter) needs to manage life
time of umh on its own via two unix pipes and the pid of umh.
The exit code of such kernel module should kill the umh it started,
so that rmmod of the kernel module will cleanup the corresponding umh.
Just like if the kernel module does kmalloc() it should kfree() it
in the exit code.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
fs/exec.c | 38 +++++++++++----
include/linux/binfmts.h | 1 +
include/linux/umh.h | 12 +++++
kernel/umh.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++--
4 files changed, 164 insertions(+), 12 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 183059c427b9..30a36c2a39bf 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
/*
* sys_execve() executes a new program.
*/
-static int do_execveat_common(int fd, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp,
- int flags)
+static int __do_execve_file(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags, struct file *file)
{
char *pathbuf = NULL;
struct linux_binprm *bprm;
- struct file *file;
struct files_struct *displaced;
int retval;
@@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename *filename,
check_unsafe_exec(bprm);
current->in_execve = 1;
- file = do_open_execat(fd, filename, flags);
+ if (!file)
+ file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
@@ -1760,7 +1760,9 @@ static int do_execveat_common(int fd, struct filename *filename,
sched_exec();
bprm->file = file;
- if (fd == AT_FDCWD || filename->name[0] == '/') {
+ if (!filename) {
+ bprm->filename = "none";
+ } else if (fd == AT_FDCWD || filename->name[0] == '/') {
bprm->filename = filename->name;
} else {
if (filename->name[0] == '\0')
@@ -1826,7 +1828,8 @@ static int do_execveat_common(int fd, struct filename *filename,
task_numa_free(current);
free_bprm(bprm);
kfree(pathbuf);
- putname(filename);
+ if (filename)
+ putname(filename);
if (displaced)
put_files_struct(displaced);
return retval;
@@ -1849,10 +1852,27 @@ static int do_execveat_common(int fd, struct filename *filename,
if (displaced)
reset_files_struct(displaced);
out_ret:
- putname(filename);
+ if (filename)
+ putname(filename);
return retval;
}
+static int do_execveat_common(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags)
+{
+ return __do_execve_file(fd, filename, argv, envp, flags, NULL);
+}
+
+int do_execve_file(struct file *file, void *__argv, void *__envp)
+{
+ struct user_arg_ptr argv = { .ptr.native = __argv };
+ struct user_arg_ptr envp = { .ptr.native = __envp };
+
+ return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
+}
+
int do_execve(struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 4955e0863b83..c05f24fac4f6 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -150,5 +150,6 @@ extern int do_execveat(int, struct filename *,
const char __user * const __user *,
const char __user * const __user *,
int);
+int do_execve_file(struct file *file, void *__argv, void *__envp);
#endif /* _LINUX_BINFMTS_H */
diff --git a/include/linux/umh.h b/include/linux/umh.h
index 244aff638220..5c812acbb80a 100644
--- a/include/linux/umh.h
+++ b/include/linux/umh.h
@@ -22,8 +22,10 @@ struct subprocess_info {
const char *path;
char **argv;
char **envp;
+ struct file *file;
int wait;
int retval;
+ pid_t pid;
int (*init)(struct subprocess_info *info, struct cred *new);
void (*cleanup)(struct subprocess_info *info);
void *data;
@@ -38,6 +40,16 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp,
int (*init)(struct subprocess_info *info, struct cred *new),
void (*cleanup)(struct subprocess_info *), void *data);
+struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
+ int (*init)(struct subprocess_info *info, struct cred *new),
+ void (*cleanup)(struct subprocess_info *), void *data);
+struct umh_info {
+ struct file *pipe_to_umh;
+ struct file *pipe_from_umh;
+ pid_t pid;
+};
+int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
+
extern int
call_usermodehelper_exec(struct subprocess_info *info, int wait);
diff --git a/kernel/umh.c b/kernel/umh.c
index f76b3ff876cf..30db93fd7e39 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -25,6 +25,8 @@
#include <linux/ptrace.h>
#include <linux/async.h>
#include <linux/uaccess.h>
+#include <linux/shmem_fs.h>
+#include <linux/pipe_fs_i.h>
#include <trace/events/module.h>
@@ -97,9 +99,13 @@ static int call_usermodehelper_exec_async(void *data)
commit_creds(new);
- retval = do_execve(getname_kernel(sub_info->path),
- (const char __user *const __user *)sub_info->argv,
- (const char __user *const __user *)sub_info->envp);
+ if (sub_info->file)
+ retval = do_execve_file(sub_info->file,
+ sub_info->argv, sub_info->envp);
+ else
+ retval = do_execve(getname_kernel(sub_info->path),
+ (const char __user *const __user *)sub_info->argv,
+ (const char __user *const __user *)sub_info->envp);
out:
sub_info->retval = retval;
/*
@@ -185,6 +191,8 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
if (pid < 0) {
sub_info->retval = pid;
umh_complete(sub_info);
+ } else {
+ sub_info->pid = pid;
}
}
}
@@ -393,6 +401,117 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
}
EXPORT_SYMBOL(call_usermodehelper_setup);
+struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
+ int (*init)(struct subprocess_info *info, struct cred *new),
+ void (*cleanup)(struct subprocess_info *info), void *data)
+{
+ struct subprocess_info *sub_info;
+
+ sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
+ if (!sub_info)
+ return NULL;
+
+ INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
+ sub_info->path = "none";
+ sub_info->file = file;
+ sub_info->init = init;
+ sub_info->cleanup = cleanup;
+ sub_info->data = data;
+ return sub_info;
+}
+
+static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
+{
+ struct umh_info *umh_info = info->data;
+ struct file *from_umh[2];
+ struct file *to_umh[2];
+ int err;
+
+ /* create pipe to send data to umh */
+ err = create_pipe_files(to_umh, 0);
+ if (err)
+ return err;
+ err = replace_fd(0, to_umh[0], 0);
+ fput(to_umh[0]);
+ if (err < 0) {
+ fput(to_umh[1]);
+ return err;
+ }
+
+ /* create pipe to receive data from umh */
+ err = create_pipe_files(from_umh, 0);
+ if (err) {
+ fput(to_umh[1]);
+ replace_fd(0, NULL, 0);
+ return err;
+ }
+ err = replace_fd(1, from_umh[1], 0);
+ fput(from_umh[1]);
+ if (err < 0) {
+ fput(to_umh[1]);
+ replace_fd(0, NULL, 0);
+ fput(from_umh[0]);
+ return err;
+ }
+
+ umh_info->pipe_to_umh = to_umh[1];
+ umh_info->pipe_from_umh = from_umh[0];
+ return 0;
+}
+
+static void umh_save_pid(struct subprocess_info *info)
+{
+ struct umh_info *umh_info = info->data;
+
+ umh_info->pid = info->pid;
+}
+
+/**
+ * fork_usermode_blob - fork a blob of bytes as a usermode process
+ * @data: a blob of bytes that can be do_execv-ed as a file
+ * @len: length of the blob
+ * @info: information about usermode process (shouldn't be NULL)
+ *
+ * Returns either negative error or zero which indicates success
+ * in executing a blob of bytes as a usermode process. In such
+ * case 'struct umh_info *info' is populated with two pipes
+ * and a pid of the process. The caller is responsible for health
+ * check of the user process, killing it via pid, and closing the
+ * pipes when user process is no longer needed.
+ */
+int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
+{
+ struct subprocess_info *sub_info;
+ struct file *file;
+ ssize_t written;
+ loff_t pos = 0;
+ int err;
+
+ file = shmem_kernel_file_setup("", len, 0);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ written = kernel_write(file, data, len, &pos);
+ if (written != len) {
+ err = written;
+ if (err >= 0)
+ err = -ENOMEM;
+ goto out;
+ }
+
+ err = -ENOMEM;
+ sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup,
+ umh_save_pid, info);
+ if (!sub_info)
+ goto out;
+
+ err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
+out:
+ fput(file);
+ return err;
+}
+EXPORT_SYMBOL_GPL(fork_usermode_blob);
+
/**
* call_usermodehelper_exec - start a usermode application
* @sub_info: information about the subprocessa
--
2.9.5
^ permalink raw reply related
* [PATCH v3 net-next 0/2] bpfilter
From: Alexei Starovoitov @ 2018-05-22 2:22 UTC (permalink / raw)
To: David S . Miller
Cc: daniel, torvalds, gregkh, luto, mcgrof, keescook, netdev,
linux-kernel, kernel-team
Hi All,
v2->v3:
- followed Luis's suggestion and significantly simplied first patch
with shmem_kernel_file_setup+kernel_write. Added kdoc for new helper
- fixed typos and race to access pipes with mutex
- tested with bpfilter being 'builtin'. CONFIG_BPFILTER_UMH=y|m both work.
Interesting to see a usermode executable being embedded inside vmlinux.
- it doesn't hurt to enable bpfilter in .config.
ip_setsockopt commands sent to usermode via pipes and -ENOPROTOOPT is
returned from userspace, so kernel falls back to original iptables code
v1->v2:
this patch set is almost a full rewrite of the earlier umh modules approach
The v1 of patches and follow up discussion was covered by LWN:
https://lwn.net/Articles/749108/
I believe the v2 addresses all issues brought up by Andy and others.
Mainly there are zero changes to kernel/module.c
Instead of teaching module loading logic to recognize special
umh module, let normal kernel modules execute part of its own
.init.rodata as a new user space process (Andy's idea)
Patch 1 introduces this new helper:
int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
Input:
data + len == executable file
Output:
struct umh_info {
struct file *pipe_to_umh;
struct file *pipe_from_umh;
pid_t pid;
};
Advantages vs v1:
- the embedded user mode executable is stored as .init.rodata inside
normal kernel module. These pages are freed when .ko finishes loading
- the elf file is copied into tmpfs file. The user mode process is swappable.
- the communication between user mode process and 'parent' kernel module
is done via two unix pipes, hence protocol is not exposed to
user space
- impossible to launch umh on its own (that was the main issue of v1)
and impossible to be man-in-the-middle due to pipes
- bpfilter.ko consists of tiny kernel part that passes the data
between kernel and umh via pipes and much bigger umh part that
doing all the work
- 'lsmod' shows bpfilter.ko as usual.
'rmmod bpfilter' removes kernel module and kills corresponding umh
- signed bpfilter.ko covers the whole image including umh code
Few issues:
- the user can still attach to the process and debug it with
'gdb /proc/pid/exe pid', but 'gdb -p pid' doesn't work.
(a bit worse comparing to v1)
- tinyconfig will notice a small increase in .text
+766 | TEXT | 7c8b94806bec umh: introduce fork_usermode_blob() helper
Alexei Starovoitov (2):
umh: introduce fork_usermode_blob() helper
net: add skeleton of bpfilter kernel module
fs/exec.c | 38 ++++++++++---
include/linux/binfmts.h | 1 +
include/linux/bpfilter.h | 15 +++++
include/linux/umh.h | 12 ++++
include/uapi/linux/bpfilter.h | 21 +++++++
kernel/umh.c | 125 +++++++++++++++++++++++++++++++++++++++++-
net/Kconfig | 2 +
net/Makefile | 1 +
net/bpfilter/Kconfig | 16 ++++++
net/bpfilter/Makefile | 30 ++++++++++
net/bpfilter/bpfilter_kern.c | 111 +++++++++++++++++++++++++++++++++++++
net/bpfilter/main.c | 63 +++++++++++++++++++++
net/bpfilter/msgfmt.h | 17 ++++++
net/ipv4/Makefile | 2 +
net/ipv4/bpfilter/Makefile | 2 +
net/ipv4/bpfilter/sockopt.c | 42 ++++++++++++++
net/ipv4/ip_sockglue.c | 17 ++++++
17 files changed, 503 insertions(+), 12 deletions(-)
create mode 100644 include/linux/bpfilter.h
create mode 100644 include/uapi/linux/bpfilter.h
create mode 100644 net/bpfilter/Kconfig
create mode 100644 net/bpfilter/Makefile
create mode 100644 net/bpfilter/bpfilter_kern.c
create mode 100644 net/bpfilter/main.c
create mode 100644 net/bpfilter/msgfmt.h
create mode 100644 net/ipv4/bpfilter/Makefile
create mode 100644 net/ipv4/bpfilter/sockopt.c
--
2.9.5
^ permalink raw reply
* [PATCH v3 net-next 2/2] net: add skeleton of bpfilter kernel module
From: Alexei Starovoitov @ 2018-05-22 2:22 UTC (permalink / raw)
To: David S . Miller
Cc: daniel, torvalds, gregkh, luto, mcgrof, keescook, netdev,
linux-kernel, kernel-team
In-Reply-To: <20180522022230.2492505-1-ast@kernel.org>
bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
and user mode helper code that is embedded into bpfilter.ko
The steps to build bpfilter.ko are the following:
- main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
- with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
is converted into bpfilter_umh.o object file
with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
Example:
$ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
- bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
bpfilter_kern.c is a normal kernel module code that calls
the fork_usermode_blob() helper to execute part of its own data
as a user mode process.
Notice that _binary_net_bpfilter_bpfilter_umh_start - end
is placed into .init.rodata section, so it's freed as soon as __init
function of bpfilter.ko is finished.
As part of __init the bpfilter.ko does first request/reply action
via two unix pipe provided by fork_usermode_blob() helper to
make sure that umh is healthy. If not it will kill it via pid.
Later bpfilter_process_sockopt() will be called from bpfilter hooks
in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
kill umh as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpfilter.h | 15 ++++++
include/uapi/linux/bpfilter.h | 21 ++++++++
net/Kconfig | 2 +
net/Makefile | 1 +
net/bpfilter/Kconfig | 16 ++++++
net/bpfilter/Makefile | 30 ++++++++++++
net/bpfilter/bpfilter_kern.c | 111 ++++++++++++++++++++++++++++++++++++++++++
net/bpfilter/main.c | 63 ++++++++++++++++++++++++
net/bpfilter/msgfmt.h | 17 +++++++
net/ipv4/Makefile | 2 +
net/ipv4/bpfilter/Makefile | 2 +
net/ipv4/bpfilter/sockopt.c | 42 ++++++++++++++++
net/ipv4/ip_sockglue.c | 17 +++++++
13 files changed, 339 insertions(+)
create mode 100644 include/linux/bpfilter.h
create mode 100644 include/uapi/linux/bpfilter.h
create mode 100644 net/bpfilter/Kconfig
create mode 100644 net/bpfilter/Makefile
create mode 100644 net/bpfilter/bpfilter_kern.c
create mode 100644 net/bpfilter/main.c
create mode 100644 net/bpfilter/msgfmt.h
create mode 100644 net/ipv4/bpfilter/Makefile
create mode 100644 net/ipv4/bpfilter/sockopt.c
diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h
new file mode 100644
index 000000000000..687b1760bb9f
--- /dev/null
+++ b/include/linux/bpfilter.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_BPFILTER_H
+#define _LINUX_BPFILTER_H
+
+#include <uapi/linux/bpfilter.h>
+
+struct sock;
+int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char *optval,
+ unsigned int optlen);
+int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char *optval,
+ int *optlen);
+extern int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
+ char __user *optval,
+ unsigned int optlen, bool is_set);
+#endif
diff --git a/include/uapi/linux/bpfilter.h b/include/uapi/linux/bpfilter.h
new file mode 100644
index 000000000000..2ec3cc99ea4c
--- /dev/null
+++ b/include/uapi/linux/bpfilter.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI_LINUX_BPFILTER_H
+#define _UAPI_LINUX_BPFILTER_H
+
+#include <linux/if.h>
+
+enum {
+ BPFILTER_IPT_SO_SET_REPLACE = 64,
+ BPFILTER_IPT_SO_SET_ADD_COUNTERS = 65,
+ BPFILTER_IPT_SET_MAX,
+};
+
+enum {
+ BPFILTER_IPT_SO_GET_INFO = 64,
+ BPFILTER_IPT_SO_GET_ENTRIES = 65,
+ BPFILTER_IPT_SO_GET_REVISION_MATCH = 66,
+ BPFILTER_IPT_SO_GET_REVISION_TARGET = 67,
+ BPFILTER_IPT_GET_MAX,
+};
+
+#endif /* _UAPI_LINUX_BPFILTER_H */
diff --git a/net/Kconfig b/net/Kconfig
index df8d45ef47d8..ba554cedb615 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -202,6 +202,8 @@ source "net/bridge/netfilter/Kconfig"
endif
+source "net/bpfilter/Kconfig"
+
source "net/dccp/Kconfig"
source "net/sctp/Kconfig"
source "net/rds/Kconfig"
diff --git a/net/Makefile b/net/Makefile
index 77aaddedbd29..bdaf53925acd 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_TLS) += tls/
obj-$(CONFIG_XFRM) += xfrm/
obj-$(CONFIG_UNIX) += unix/
obj-$(CONFIG_NET) += ipv6/
+obj-$(CONFIG_BPFILTER) += bpfilter/
obj-$(CONFIG_PACKET) += packet/
obj-$(CONFIG_NET_KEY) += key/
obj-$(CONFIG_BRIDGE) += bridge/
diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
new file mode 100644
index 000000000000..60725c5f79db
--- /dev/null
+++ b/net/bpfilter/Kconfig
@@ -0,0 +1,16 @@
+menuconfig BPFILTER
+ bool "BPF based packet filtering framework (BPFILTER)"
+ default n
+ depends on NET && BPF
+ help
+ This builds experimental bpfilter framework that is aiming to
+ provide netfilter compatible functionality via BPF
+
+if BPFILTER
+config BPFILTER_UMH
+ tristate "bpfilter kernel module with user mode helper"
+ default m
+ help
+ This builds bpfilter kernel module with embedded user mode helper
+endif
+
diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
new file mode 100644
index 000000000000..2af752c8ef5e
--- /dev/null
+++ b/net/bpfilter/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Linux BPFILTER layer.
+#
+
+hostprogs-y := bpfilter_umh
+bpfilter_umh-objs := main.o
+HOSTCFLAGS += -I. -Itools/include/
+ifeq ($(CONFIG_BPFILTER_UMH), y)
+# builtin bpfilter_umh should be compiled with -static
+# since rootfs isn't mounted at the time of __init
+# function is called and do_execv won't find elf interpreter
+HOSTLDFLAGS += -static
+endif
+
+# a bit of elf magic to convert bpfilter_umh binary into a binary blob
+# inside bpfilter_umh.o elf file referenced by
+# _binary_net_bpfilter_bpfilter_umh_start symbol
+# which bpfilter_kern.c passes further into umh blob loader at run-time
+quiet_cmd_copy_umh = GEN $@
+ cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
+ $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
+ -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
+ --rename-section .data=.init.rodata $< $@
+
+$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
+ $(call cmd,copy_umh)
+
+obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
+bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
new file mode 100644
index 000000000000..7596314b61c7
--- /dev/null
+++ b/net/bpfilter/bpfilter_kern.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/umh.h>
+#include <linux/bpfilter.h>
+#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include "msgfmt.h"
+
+#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
+#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
+
+extern char UMH_start;
+extern char UMH_end;
+
+static struct umh_info info;
+/* since ip_getsockopt() can run in parallel, serialize access to umh */
+static DEFINE_MUTEX(bpfilter_lock);
+
+static void shutdown_umh(struct umh_info *info)
+{
+ struct task_struct *tsk;
+
+ tsk = pid_task(find_vpid(info->pid), PIDTYPE_PID);
+ if (tsk)
+ force_sig(SIGKILL, tsk);
+ fput(info->pipe_to_umh);
+ fput(info->pipe_from_umh);
+}
+
+static void __stop_umh(void)
+{
+ if (bpfilter_process_sockopt) {
+ bpfilter_process_sockopt = NULL;
+ shutdown_umh(&info);
+ }
+}
+
+static void stop_umh(void)
+{
+ mutex_lock(&bpfilter_lock);
+ __stop_umh();
+ mutex_unlock(&bpfilter_lock);
+}
+
+static int __bpfilter_process_sockopt(struct sock *sk, int optname,
+ char __user *optval,
+ unsigned int optlen, bool is_set)
+{
+ struct mbox_request req;
+ struct mbox_reply reply;
+ loff_t pos;
+ ssize_t n;
+ int ret;
+
+ req.is_set = is_set;
+ req.pid = current->pid;
+ req.cmd = optname;
+ req.addr = (long)optval;
+ req.len = optlen;
+ mutex_lock(&bpfilter_lock);
+ n = __kernel_write(info.pipe_to_umh, &req, sizeof(req), &pos);
+ if (n != sizeof(req)) {
+ pr_err("write fail %zd\n", n);
+ __stop_umh();
+ ret = -EFAULT;
+ goto out;
+ }
+ pos = 0;
+ n = kernel_read(info.pipe_from_umh, &reply, sizeof(reply), &pos);
+ if (n != sizeof(reply)) {
+ pr_err("read fail %zd\n", n);
+ __stop_umh();
+ ret = -EFAULT;
+ goto out;
+ }
+ ret = reply.status;
+out:
+ mutex_unlock(&bpfilter_lock);
+ return ret;
+}
+
+static int __init load_umh(void)
+{
+ int err;
+
+ /* fork usermode process */
+ err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
+ if (err)
+ return err;
+ pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
+
+ /* health check that usermode process started correctly */
+ if (__bpfilter_process_sockopt(NULL, 0, 0, 0, 0) != 0) {
+ stop_umh();
+ return -EFAULT;
+ }
+ bpfilter_process_sockopt = &__bpfilter_process_sockopt;
+ return 0;
+}
+
+static void __exit fini_umh(void)
+{
+ stop_umh();
+}
+module_init(load_umh);
+module_exit(fini_umh);
+MODULE_LICENSE("GPL");
diff --git a/net/bpfilter/main.c b/net/bpfilter/main.c
new file mode 100644
index 000000000000..81bbc1684896
--- /dev/null
+++ b/net/bpfilter/main.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sys/uio.h>
+#include <errno.h>
+#include <stdio.h>
+#include <sys/socket.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include "include/uapi/linux/bpf.h"
+#include <asm/unistd.h>
+#include "msgfmt.h"
+
+int debug_fd;
+
+static int handle_get_cmd(struct mbox_request *cmd)
+{
+ switch (cmd->cmd) {
+ case 0:
+ return 0;
+ default:
+ break;
+ }
+ return -ENOPROTOOPT;
+}
+
+static int handle_set_cmd(struct mbox_request *cmd)
+{
+ return -ENOPROTOOPT;
+}
+
+static void loop(void)
+{
+ while (1) {
+ struct mbox_request req;
+ struct mbox_reply reply;
+ int n;
+
+ n = read(0, &req, sizeof(req));
+ if (n != sizeof(req)) {
+ dprintf(debug_fd, "invalid request %d\n", n);
+ return;
+ }
+
+ reply.status = req.is_set ?
+ handle_set_cmd(&req) :
+ handle_get_cmd(&req);
+
+ n = write(1, &reply, sizeof(reply));
+ if (n != sizeof(reply)) {
+ dprintf(debug_fd, "reply failed %d\n", n);
+ return;
+ }
+ }
+}
+
+int main(void)
+{
+ debug_fd = open("/dev/console", 00000002 | 00000100);
+ dprintf(debug_fd, "Started bpfilter\n");
+ loop();
+ close(debug_fd);
+ return 0;
+}
diff --git a/net/bpfilter/msgfmt.h b/net/bpfilter/msgfmt.h
new file mode 100644
index 000000000000..98d121c62945
--- /dev/null
+++ b/net/bpfilter/msgfmt.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _NET_BPFILTER_MSGFMT_H
+#define _NET_BPFILTER_MSGFMT_H
+
+struct mbox_request {
+ __u64 addr;
+ __u32 len;
+ __u32 is_set;
+ __u32 cmd;
+ __u32 pid;
+};
+
+struct mbox_reply {
+ __u32 status;
+};
+
+#endif
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index b379520f9133..7018f91c5a39 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -16,6 +16,8 @@ obj-y := route.o inetpeer.o protocol.o \
inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
metrics.o
+obj-$(CONFIG_BPFILTER) += bpfilter/
+
obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
obj-$(CONFIG_PROC_FS) += proc.o
diff --git a/net/ipv4/bpfilter/Makefile b/net/ipv4/bpfilter/Makefile
new file mode 100644
index 000000000000..ce262d76cc48
--- /dev/null
+++ b/net/ipv4/bpfilter/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_BPFILTER) += sockopt.o
+
diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c
new file mode 100644
index 000000000000..42a96d2d8d05
--- /dev/null
+++ b/net/ipv4/bpfilter/sockopt.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/uaccess.h>
+#include <linux/bpfilter.h>
+#include <uapi/linux/bpf.h>
+#include <linux/wait.h>
+#include <linux/kmod.h>
+
+int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
+ char __user *optval,
+ unsigned int optlen, bool is_set);
+EXPORT_SYMBOL_GPL(bpfilter_process_sockopt);
+
+int bpfilter_mbox_request(struct sock *sk, int optname, char __user *optval,
+ unsigned int optlen, bool is_set)
+{
+ if (!bpfilter_process_sockopt) {
+ int err = request_module("bpfilter");
+
+ if (err)
+ return err;
+ if (!bpfilter_process_sockopt)
+ return -ECHILD;
+ }
+ return bpfilter_process_sockopt(sk, optname, optval, optlen, is_set);
+}
+
+int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval,
+ unsigned int optlen)
+{
+ return bpfilter_mbox_request(sk, optname, optval, optlen, true);
+}
+
+int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval,
+ int __user *optlen)
+{
+ int len;
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+
+ return bpfilter_mbox_request(sk, optname, optval, len, false);
+}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 5ad2d8ed3a3f..e0791faacb24 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -47,6 +47,8 @@
#include <linux/errqueue.h>
#include <linux/uaccess.h>
+#include <linux/bpfilter.h>
+
/*
* SOL_IP control messages.
*/
@@ -1244,6 +1246,11 @@ int ip_setsockopt(struct sock *sk, int level,
return -ENOPROTOOPT;
err = do_ip_setsockopt(sk, level, optname, optval, optlen);
+#ifdef CONFIG_BPFILTER
+ if (optname >= BPFILTER_IPT_SO_SET_REPLACE &&
+ optname < BPFILTER_IPT_SET_MAX)
+ err = bpfilter_ip_set_sockopt(sk, optname, optval, optlen);
+#endif
#ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
@@ -1552,6 +1559,11 @@ int ip_getsockopt(struct sock *sk, int level,
int err;
err = do_ip_getsockopt(sk, level, optname, optval, optlen, 0);
+#ifdef CONFIG_BPFILTER
+ if (optname >= BPFILTER_IPT_SO_GET_INFO &&
+ optname < BPFILTER_IPT_GET_MAX)
+ err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
+#endif
#ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
@@ -1584,6 +1596,11 @@ int compat_ip_getsockopt(struct sock *sk, int level, int optname,
err = do_ip_getsockopt(sk, level, optname, optval, optlen,
MSG_CMSG_COMPAT);
+#ifdef CONFIG_BPFILTER
+ if (optname >= BPFILTER_IPT_SO_GET_INFO &&
+ optname < BPFILTER_IPT_GET_MAX)
+ err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
+#endif
#ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
--
2.9.5
^ permalink raw reply related
* Re: [PATCH] common/rc: add the function _require_noatime
From: Eryu Guan @ 2018-05-22 2:21 UTC (permalink / raw)
To: Steve French; +Cc: Xiaoli Feng, CIFS, fstests
In-Reply-To: <CAH2r5mt3f61eYobvEF6pwaJ0eNCiuXuNMfRq3_wi2=O=31Y1Qg@mail.gmail.com>
On Mon, May 21, 2018 at 06:54:45PM -0500, Steve French wrote:
> Should this be fixed by changing cifs.ko to accept the mount parm but ignore it?
>
> If this test works on NFS (the noatime mount option has no meaning for
> NFS apparently) we should do the same
NFS works because tests that call _require_atime will _notrun on NFS.
>
> Quoting the NFS man page:
>
> In particular, the atime/noatime, diratime/nodiratime, relatime/norela‐
> time, and strictatime/nostrictatime mount options have no effect on NFS
> mounts.
Yeah, the NFS check in _require_atime was based on this description. If
atime related mount options have no effect on CIFS too, we could simply,
from fstests' prospect of view, _notrun for CIFS in _require_atime too.
Thanks,
Eryu
>
> On Mon, May 21, 2018 at 3:50 AM, Xiaoli Feng <xifeng@redhat.com> wrote:
> >
> > [add linux-cifs@vger.kernel.org to cc list]
> >
> > ----- Forwarded Message -----
> > From: "Eryu Guan" <guaneryu@gmail.com>
> > To: "XiaoLi Feng" <xifeng@redhat.com>
> > Cc: fstests@vger.kernel.org
> > Sent: Monday, May 21, 2018 3:50:27 PM
> > Subject: Re: [PATCH] common/rc: add the function _require_noatime
> >
> > [add linux-cifs@vger.kernel.org to cc list]
> >
> > On Fri, May 18, 2018 at 07:10:29PM +0800, Xiaoli Feng wrote:
> >> From: xiaoli feng <xifeng@redhat.com>
> >>
> >> In the generic/120, it will make the test not-pass if the filesystem
> >> mounts failed with noatime. Now change this result to norun. The
> >> filesystem cifs doesn't support noatime. Just make the test norun
> >> until it supports noatime.
> >>
> >> Signed-off-by: xiaoli feng <xifeng@redhat.com>
> >> ---
> >> common/rc | 8 +++++++-
> >> tests/generic/120 | 7 +------
> >> 2 files changed, 8 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/common/rc b/common/rc
> >> index ffe5323..9c45f1b 100644
> >> --- a/common/rc
> >> +++ b/common/rc
> >> @@ -3244,7 +3244,13 @@ _require_atime()
> >> _exclude_scratch_mount_option "noatime"
> >> if [ "$FSTYP" == "nfs" ]; then
> >> _notrun "atime related mount options have no effect on NFS"
> >> - fi
> >
> > I'm not sure what's the expected behavior from CIFS on atime/noatime, so
> > I added linux-cifs list for input.
> >
> > If CIFS behaves similarly to NFS, looks like that you could simply add
> > another check for cifs in _require_atime(), as what we already do for
> > nfs, so all tests that _require_atime() will _notrun on cifs.
> >
> >> +}
> >> +
> >> +_require_noatime()
> >> +{
> >> + _exclude_scratch_mount_option "atime"
> >> + _try_scratch_mount -o noatime || \
> >> + _notrun "noatime not supported by the current tested filesystem"
> >> }
> >>
> >> _require_relatime()
> >> diff --git a/tests/generic/120 b/tests/generic/120
> >> index 1180c10..ddd61b3 100755
> >> --- a/tests/generic/120
> >> +++ b/tests/generic/120
> >> @@ -60,12 +60,7 @@ _compare_access_times()
> >>
> >> }
> >>
> >> -if ! _try_scratch_mount "-o noatime" >$tmp.out 2>&1
> >> -then
> >> - cat $tmp.out
> >> - echo "!!! mount failed"
> >> - exit
> >> -fi
> >> +_require_noatime
> >
> > Anyway, failing the test when "noatime" mount fails is one of the
> > purposes of the test, and it shouldn't be removed, as we've already made
> > sure current FSTYP supports atime/noatime (by _require rules), so a
> > noatime mount failure indicates a bug in the filesystem.
> >
> > Thanks,
> > Eryu
> >
> >>
> >> #executable file
> >> echo "*** copying file ***"
> >> --
> >> 1.8.3.1
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe fstests" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Thanks,
>
> Steve
^ permalink raw reply
* Re: [Fuego] [PATCH] core: add log_this function
From: Daniel Sangorrin @ 2018-05-22 2:21 UTC (permalink / raw)
To: Tim.Bird, fuego
In-Reply-To: <ECADFF3FD767C149AD96A924E7EA6EAF7C14CE50@USCULXMSG01.am.sony.com>
Hi Tim,
I noticed that you are appending the log to testlog. It would be nice to add a separator such as:
Fuego: board A testlog
...
Fuego: host testlog
....x
Also I was thinking that from a systems perspective, it would be nice to be able to rename the log files.
For example, suppose that we have a test that requires 4 boards (A, B, C, D) and the test is coordinated by board E (it could be "docker" or the host as well).
Initially, we would run ftc run-test on board E (the coordinator). Then board-E's fuego_test.sh would execute
ftc run-test -b boardname --log boardname.log -t ...
on boards [A..D].
# it would be nice to have a new category (like Benchmark and Functional) for monitoring tests that just check the disk usage or network status. Something like MONITORING.free (check memory usage) etc.
Finally, during the post processing phase board E can merge the logs into one (using a separator) and give it to the parser.
# Of course Board E should be able to use Fuego core functions for switching on the boards, waiting until all of them are ready etc.
Thanks,
Daniel
> -----Original Message-----
> From: fuego-bounces@lists.linuxfoundation.org
> [mailto:fuego-bounces@lists.linuxfoundation.org] On Behalf Of
> Tim.Bird@sony.com
> Sent: Tuesday, May 22, 2018 3:40 AM
> To: fuego@lists.linuxfoundation.org
> Subject: [Fuego] [PATCH] core: add log_this function
>
> Hey Fuego-ans,
>
> Here is a patch that I applied to fuego-core last week. I've been doing
> some thinking about some longstanding issues with tests that have a
> host-side component to their data gathering. Based on this, and recent
> discussions on the list, I implemented a new "log_this" function.
> It's like the "report" function, but for a host-side command.
>
> I believe this will be a new, important architectural feature of Fuego.
>
> This is part of a broader effort to expand the scope of Fuego testing, from just
> target-side testing, to more system-wide testing. It's clear that for some types of
> hardware testing, additional off-DUT frameworks will need to be accessed,
> and in some cases controlled. This new function "log_this" is the start of
> support for logging the access to such non-DUT frameworks
> (facilities, devices, harnesses, resources, etc.)
>
> I'm also thinking about what's needed to provide for generalized control
> of such things. This is a tricky subject, due to the incredible fragmentation
> there is in board control hardware, secondary resource control, and associated
> driving software.
> However, I'm considering implementing some kind of generic resource
> reservation and management system (over the long run - this is not the highest
> priority at the moment).
>
> In any event, here's the patch for this little bit, which is actually pretty simple...
> --------------
> Some tests need to get information and data from host-side
> operations, that needs to be reported and analyzed by Fuego.
>
> The log_this function captures the output of commands executed
> on the host, and puts it (ultimately) into the test log for a run.
> Any command executed with "log_this" is saved during test execution,
> and placed in the final testlog.txt, after any
> board-side log data (from report and report_append) commands.
>
> There are several tests (especially Fuego self-tests) that could
> use this feature, to avoid an awkward sequence of push-to-target,
> and report-cat, to get log data from the host into the testlog.
>
> Signed-off-by: Tim Bird <tim.bird@sony.com>
> ---
> engine/scripts/functions.sh | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/engine/scripts/functions.sh b/engine/scripts/functions.sh
> index 0b293db..8fabd85 100755
> --- a/engine/scripts/functions.sh
> +++ b/engine/scripts/functions.sh
> @@ -226,6 +226,21 @@ function report_append {
> return ${RESULT}
> }
>
> +# $1 - local shell command
> +function log_this {
> + is_empty $1
> +
> + RETCODE=/tmp/$$-${RANDOM}
> + touch $RETCODE
> +
> + { $1; echo $? > $RETCODE ; } 2>&1 | tee -a ${LOGDIR}/hostlog.txt
> +
> + RESULT=$(cat $RETCODE)
> + rm -f $RETCODE
> + export REPORT_RETURN_VALUE=${RESULT}
> + return ${RESULT}
> +}
> +
> function dump_syslogs {
> # 1 - tmp dir, 2 - before/after
>
> @@ -466,6 +481,10 @@ function fetch_results {
> get $BOARD_TESTDIR/fuego.$TESTDIR/$TESTDIR.log ${LOGDIR}/testlog.txt
> || \
> echo "INFO: the test did not produce a test log on the target" | tee
> ${LOGDIR}/testlog.txt
>
> + if [ -f ${LOGDIR}/hostlog.txt ] ; then
> + cat ${LOGDIR}/hostlog.txt >> ${LOGDIR}/testlog.txt
> + fi
> +
> # Get syslogs
> dump_syslogs ${fuego_test_tmp} "after"
> get
> ${fuego_test_tmp}/${NODE_NAME}.${BUILD_ID}.${BUILD_NUMBER}.before
> ${LOGDIR}/syslog.before.txt
> --
> 2.1.4
>
> _______________________________________________
> Fuego mailing list
> Fuego@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/fuego
^ permalink raw reply
* [PATCH] [RFC] bcachefs: SIX locks (shared/intent/exclusive)
From: Kent Overstreet @ 2018-05-22 2:19 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, linux-xfs, linux-btrfs, peterz
Cc: Kent Overstreet
New lock for bcachefs, like read/write locks but with a third state,
intent.
Intent locks conflict with each other, but not with read locks; taking a
write lock requires first holding an intent lock.
The purpose is for multi node data structures (i.e. btrees), where if we were
using read/write locks we might need to hold a write lock for the duration of an
operation purely to avoid deadlocks.
For example, when splitting a btree node we lock a leaf node, allocate two new
nodes, copy the contents of the old node into the new nodes, then update the
parent node. With read write locks, we'd need to hold a write lock on the parent
node for the entire duration, because if we don't take it until we have to
update the parent we deadlock - because we still have a child locked - and we
can't unlock the child, because we need to free it as soon as we've deleted the
pointer to it. This blocks not only lookups, but updates that would only touch
unrelated leaf nodes.
With intent locks, we can hold an intent lock on the parent node for the
duration of the operation, only taking a write lock on it for the update to that
specific node.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
I implemented these for bcachefs's btrees, but they're really not bcachefs
specific - I figure since I don't have the only btree implementation it might be
worth at least seeing if anyone else sees a use for them.
Also, they use osq_lock()/unlock(), which Peter Zijlstra really doesn't want
exported, so if anyone else would like to use them they could be moved out of
fs/bcachefs/ and that would solve another problem for me :)
fs/bcachefs/six.c | 516 ++++++++++++++++++++++++++++++++++++++++++++++
fs/bcachefs/six.h | 190 +++++++++++++++++
2 files changed, 706 insertions(+)
create mode 100644 fs/bcachefs/six.c
create mode 100644 fs/bcachefs/six.h
diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
new file mode 100644
index 0000000000..afa59a476a
--- /dev/null
+++ b/fs/bcachefs/six.c
@@ -0,0 +1,516 @@
+
+#include <linux/log2.h>
+#include <linux/preempt.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/sched/rt.h>
+
+#include "six.h"
+
+#define six_acquire(l, t) lock_acquire(l, 0, t, 0, 0, NULL, _RET_IP_)
+#define six_release(l) lock_release(l, 0, _RET_IP_)
+
+struct six_lock_vals {
+ /* Value we add to the lock in order to take the lock: */
+ u64 lock_val;
+
+ /* If the lock has this value (used as a mask), taking the lock fails: */
+ u64 lock_fail;
+
+ /* Value we add to the lock in order to release the lock: */
+ u64 unlock_val;
+
+ /* Mask that indicates lock is held for this type: */
+ u64 held_mask;
+
+ /* Waitlist we wakeup when releasing the lock: */
+ enum six_lock_type unlock_wakeup;
+};
+
+#define __SIX_LOCK_HELD_read __SIX_VAL(read_lock, ~0)
+#define __SIX_LOCK_HELD_intent __SIX_VAL(intent_lock, ~0)
+#define __SIX_LOCK_HELD_write __SIX_VAL(seq, 1)
+
+#define LOCK_VALS { \
+ [SIX_LOCK_read] = { \
+ .lock_val = __SIX_VAL(read_lock, 1), \
+ .lock_fail = __SIX_LOCK_HELD_write, \
+ .unlock_val = -__SIX_VAL(read_lock, 1), \
+ .held_mask = __SIX_LOCK_HELD_read, \
+ .unlock_wakeup = SIX_LOCK_write, \
+ }, \
+ [SIX_LOCK_intent] = { \
+ .lock_val = __SIX_VAL(intent_lock, 1), \
+ .lock_fail = __SIX_LOCK_HELD_intent, \
+ .unlock_val = -__SIX_VAL(intent_lock, 1), \
+ .held_mask = __SIX_LOCK_HELD_intent, \
+ .unlock_wakeup = SIX_LOCK_intent, \
+ }, \
+ [SIX_LOCK_write] = { \
+ .lock_val = __SIX_VAL(seq, 1), \
+ .lock_fail = __SIX_LOCK_HELD_read, \
+ .unlock_val = __SIX_VAL(seq, 1), \
+ .held_mask = __SIX_LOCK_HELD_write, \
+ .unlock_wakeup = SIX_LOCK_read, \
+ }, \
+}
+
+static inline void six_set_owner(struct six_lock *lock, enum six_lock_type type,
+ union six_lock_state old)
+{
+ if (type != SIX_LOCK_intent)
+ return;
+
+ if (!old.intent_lock) {
+ EBUG_ON(lock->owner);
+ lock->owner = current;
+ } else {
+ EBUG_ON(lock->owner != current);
+ }
+}
+
+static inline void six_clear_owner(struct six_lock *lock, enum six_lock_type type)
+{
+ if (type != SIX_LOCK_intent)
+ return;
+
+ EBUG_ON(lock->owner != current);
+
+ if (lock->state.intent_lock == 1)
+ lock->owner = NULL;
+}
+
+static __always_inline bool do_six_trylock_type(struct six_lock *lock,
+ enum six_lock_type type)
+{
+ const struct six_lock_vals l[] = LOCK_VALS;
+ union six_lock_state old;
+ u64 v = READ_ONCE(lock->state.v);
+
+ EBUG_ON(type == SIX_LOCK_write && lock->owner != current);
+
+ do {
+ old.v = v;
+
+ EBUG_ON(type == SIX_LOCK_write &&
+ ((old.v & __SIX_LOCK_HELD_write) ||
+ !(old.v & __SIX_LOCK_HELD_intent)));
+
+ if (old.v & l[type].lock_fail)
+ return false;
+ } while ((v = atomic64_cmpxchg_acquire(&lock->state.counter,
+ old.v,
+ old.v + l[type].lock_val)) != old.v);
+
+ six_set_owner(lock, type, old);
+ return true;
+}
+
+__always_inline __flatten
+static bool __six_trylock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ if (!do_six_trylock_type(lock, type))
+ return false;
+
+ six_acquire(&lock->dep_map, 1);
+ return true;
+}
+
+__always_inline __flatten
+static bool __six_relock_type(struct six_lock *lock, enum six_lock_type type,
+ unsigned seq)
+{
+ const struct six_lock_vals l[] = LOCK_VALS;
+ union six_lock_state old;
+ u64 v = READ_ONCE(lock->state.v);
+
+ do {
+ old.v = v;
+
+ if (old.seq != seq || old.v & l[type].lock_fail)
+ return false;
+ } while ((v = atomic64_cmpxchg_acquire(&lock->state.counter,
+ old.v,
+ old.v + l[type].lock_val)) != old.v);
+
+ six_set_owner(lock, type, old);
+ six_acquire(&lock->dep_map, 1);
+ return true;
+}
+
+struct six_lock_waiter {
+ struct list_head list;
+ struct task_struct *task;
+};
+
+/* This is probably up there with the more evil things I've done */
+#define waitlist_bitnr(id) ilog2((((union six_lock_state) { .waiters = 1 << (id) }).l))
+
+#ifdef CONFIG_LOCK_SPIN_ON_OWNER
+
+static inline int six_can_spin_on_owner(struct six_lock *lock)
+{
+ struct task_struct *owner;
+ int retval = 1;
+
+ if (need_resched())
+ return 0;
+
+ rcu_read_lock();
+ owner = READ_ONCE(lock->owner);
+ if (owner)
+ retval = owner->on_cpu;
+ rcu_read_unlock();
+ /*
+ * if lock->owner is not set, the mutex owner may have just acquired
+ * it and not set the owner yet or the mutex has been released.
+ */
+ return retval;
+}
+
+static inline bool six_spin_on_owner(struct six_lock *lock,
+ struct task_struct *owner)
+{
+ bool ret = true;
+
+ rcu_read_lock();
+ while (lock->owner == owner) {
+ /*
+ * Ensure we emit the owner->on_cpu, dereference _after_
+ * checking lock->owner still matches owner. If that fails,
+ * owner might point to freed memory. If it still matches,
+ * the rcu_read_lock() ensures the memory stays valid.
+ */
+ barrier();
+
+ if (!owner->on_cpu || need_resched()) {
+ ret = false;
+ break;
+ }
+
+ cpu_relax();
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
+static inline bool six_optimistic_spin(struct six_lock *lock, enum six_lock_type type)
+{
+ struct task_struct *task = current;
+
+ if (type == SIX_LOCK_write)
+ return false;
+
+ preempt_disable();
+ if (!six_can_spin_on_owner(lock))
+ goto fail;
+
+ if (!osq_lock(&lock->osq))
+ goto fail;
+
+ while (1) {
+ struct task_struct *owner;
+
+ /*
+ * If there's an owner, wait for it to either
+ * release the lock or go to sleep.
+ */
+ owner = READ_ONCE(lock->owner);
+ if (owner && !six_spin_on_owner(lock, owner))
+ break;
+
+ if (do_six_trylock_type(lock, type)) {
+ osq_unlock(&lock->osq);
+ preempt_enable();
+ return true;
+ }
+
+ /*
+ * When there's no owner, we might have preempted between the
+ * owner acquiring the lock and setting the owner field. If
+ * we're an RT task that will live-lock because we won't let
+ * the owner complete.
+ */
+ if (!owner && (need_resched() || rt_task(task)))
+ break;
+
+ /*
+ * The cpu_relax() call is a compiler barrier which forces
+ * everything in this loop to be re-loaded. We don't need
+ * memory barriers as we'll eventually observe the right
+ * values at the cost of a few extra spins.
+ */
+ cpu_relax();
+ }
+
+ osq_unlock(&lock->osq);
+fail:
+ preempt_enable();
+
+ /*
+ * If we fell out of the spin path because of need_resched(),
+ * reschedule now, before we try-lock again. This avoids getting
+ * scheduled out right after we obtained the lock.
+ */
+ if (need_resched())
+ schedule();
+
+ return false;
+}
+
+#else /* CONFIG_LOCK_SPIN_ON_OWNER */
+
+static inline bool six_optimistic_spin(struct six_lock *lock, enum six_lock_type type)
+{
+ return false;
+}
+
+#endif
+
+noinline
+static void __six_lock_type_slowpath(struct six_lock *lock, enum six_lock_type type)
+{
+ const struct six_lock_vals l[] = LOCK_VALS;
+ union six_lock_state old, new;
+ struct six_lock_waiter wait;
+ u64 v;
+
+ if (six_optimistic_spin(lock, type))
+ return;
+
+ lock_contended(&lock->dep_map, _RET_IP_);
+
+ INIT_LIST_HEAD(&wait.list);
+ wait.task = current;
+
+ while (1) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (type == SIX_LOCK_write)
+ EBUG_ON(lock->owner != current);
+ else if (list_empty_careful(&wait.list)) {
+ raw_spin_lock(&lock->wait_lock);
+ list_add_tail(&wait.list, &lock->wait_list[type]);
+ raw_spin_unlock(&lock->wait_lock);
+ }
+
+ v = READ_ONCE(lock->state.v);
+ do {
+ new.v = old.v = v;
+
+ if (!(old.v & l[type].lock_fail))
+ new.v += l[type].lock_val;
+ else if (!(new.waiters & (1 << type)))
+ new.waiters |= 1 << type;
+ else
+ break; /* waiting bit already set */
+ } while ((v = atomic64_cmpxchg_acquire(&lock->state.counter,
+ old.v, new.v)) != old.v);
+
+ if (!(old.v & l[type].lock_fail))
+ break;
+
+ schedule();
+ }
+
+ six_set_owner(lock, type, old);
+
+ __set_current_state(TASK_RUNNING);
+
+ if (!list_empty_careful(&wait.list)) {
+ raw_spin_lock(&lock->wait_lock);
+ list_del_init(&wait.list);
+ raw_spin_unlock(&lock->wait_lock);
+ }
+}
+
+__always_inline
+static void __six_lock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ six_acquire(&lock->dep_map, 0);
+
+ if (!do_six_trylock_type(lock, type))
+ __six_lock_type_slowpath(lock, type);
+
+ lock_acquired(&lock->dep_map, _RET_IP_);
+}
+
+static inline void six_lock_wakeup(struct six_lock *lock,
+ union six_lock_state state,
+ unsigned waitlist_id)
+{
+ struct list_head *wait_list = &lock->wait_list[waitlist_id];
+ struct six_lock_waiter *w, *next;
+
+ if (waitlist_id == SIX_LOCK_write && state.read_lock)
+ return;
+
+ if (!(state.waiters & (1 << waitlist_id)))
+ return;
+
+ clear_bit(waitlist_bitnr(waitlist_id),
+ (unsigned long *) &lock->state.v);
+
+ if (waitlist_id == SIX_LOCK_write) {
+ struct task_struct *p = READ_ONCE(lock->owner);
+
+ if (p)
+ wake_up_process(p);
+ return;
+ }
+
+ raw_spin_lock(&lock->wait_lock);
+
+ list_for_each_entry_safe(w, next, wait_list, list) {
+ list_del_init(&w->list);
+
+ if (wake_up_process(w->task) &&
+ waitlist_id != SIX_LOCK_read) {
+ if (!list_empty(wait_list))
+ set_bit(waitlist_bitnr(waitlist_id),
+ (unsigned long *) &lock->state.v);
+ break;
+ }
+ }
+
+ raw_spin_unlock(&lock->wait_lock);
+}
+
+__always_inline __flatten
+static void __six_unlock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ const struct six_lock_vals l[] = LOCK_VALS;
+ union six_lock_state state;
+
+ EBUG_ON(!(lock->state.v & l[type].held_mask));
+ EBUG_ON(type == SIX_LOCK_write &&
+ !(lock->state.v & __SIX_LOCK_HELD_intent));
+
+ six_clear_owner(lock, type);
+
+ state.v = atomic64_add_return_release(l[type].unlock_val,
+ &lock->state.counter);
+ six_release(&lock->dep_map);
+ six_lock_wakeup(lock, state, l[type].unlock_wakeup);
+}
+
+#ifdef SIX_LOCK_SEPARATE_LOCKFNS
+
+#define __SIX_LOCK(type) \
+bool six_trylock_##type(struct six_lock *lock) \
+{ \
+ return __six_trylock_type(lock, SIX_LOCK_##type); \
+} \
+ \
+bool six_relock_##type(struct six_lock *lock, u32 seq) \
+{ \
+ return __six_relock_type(lock, SIX_LOCK_##type, seq); \
+} \
+ \
+void six_lock_##type(struct six_lock *lock) \
+{ \
+ __six_lock_type(lock, SIX_LOCK_##type); \
+} \
+ \
+void six_unlock_##type(struct six_lock *lock) \
+{ \
+ __six_unlock_type(lock, SIX_LOCK_##type); \
+}
+
+__SIX_LOCK(read)
+__SIX_LOCK(intent)
+__SIX_LOCK(write)
+
+#undef __SIX_LOCK
+
+#else
+
+bool six_trylock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ return __six_trylock_type(lock, type);
+}
+
+bool six_relock_type(struct six_lock *lock, enum six_lock_type type,
+ unsigned seq)
+{
+ return __six_relock_type(lock, type, seq);
+
+}
+
+void six_lock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ __six_lock_type(lock, type);
+}
+
+void six_unlock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ __six_unlock_type(lock, type);
+}
+
+#endif
+
+/* Convert from intent to read: */
+void six_lock_downgrade(struct six_lock *lock)
+{
+ six_lock_increment(lock, SIX_LOCK_read);
+ six_unlock_intent(lock);
+}
+
+bool six_lock_tryupgrade(struct six_lock *lock)
+{
+ const struct six_lock_vals l[] = LOCK_VALS;
+ union six_lock_state old, new;
+ u64 v = READ_ONCE(lock->state.v);
+
+ do {
+ new.v = old.v = v;
+
+ EBUG_ON(!(old.v & l[SIX_LOCK_read].held_mask));
+
+ new.v += l[SIX_LOCK_read].unlock_val;
+
+ if (new.v & l[SIX_LOCK_intent].lock_fail)
+ return false;
+
+ new.v += l[SIX_LOCK_intent].lock_val;
+ } while ((v = atomic64_cmpxchg_acquire(&lock->state.counter,
+ old.v, new.v)) != old.v);
+
+ six_set_owner(lock, SIX_LOCK_intent, old);
+ six_lock_wakeup(lock, new, l[SIX_LOCK_read].unlock_wakeup);
+
+ return true;
+}
+
+bool six_trylock_convert(struct six_lock *lock,
+ enum six_lock_type from,
+ enum six_lock_type to)
+{
+ EBUG_ON(to == SIX_LOCK_write || from == SIX_LOCK_write);
+
+ if (to == from)
+ return true;
+
+ if (to == SIX_LOCK_read) {
+ six_lock_downgrade(lock);
+ return true;
+ } else {
+ return six_lock_tryupgrade(lock);
+ }
+}
+
+/*
+ * Increment read/intent lock count, assuming we already have it read or intent
+ * locked:
+ */
+void six_lock_increment(struct six_lock *lock, enum six_lock_type type)
+{
+ const struct six_lock_vals l[] = LOCK_VALS;
+
+ EBUG_ON(type == SIX_LOCK_write);
+ six_acquire(&lock->dep_map, 0);
+
+ /* XXX: assert already locked, and that we don't overflow: */
+
+ atomic64_add(l[type].lock_val, &lock->state.counter);
+}
diff --git a/fs/bcachefs/six.h b/fs/bcachefs/six.h
new file mode 100644
index 0000000000..f518c64c40
--- /dev/null
+++ b/fs/bcachefs/six.h
@@ -0,0 +1,190 @@
+#ifndef _BCACHEFS_SIX_H
+#define _BCACHEFS_SIX_H
+
+#include <linux/lockdep.h>
+#include <linux/osq_lock.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+
+#include "util.h"
+
+#define SIX_LOCK_SEPARATE_LOCKFNS
+
+/*
+ * LOCK STATES:
+ *
+ * read, intent, write (i.e. shared/intent/exclusive, hence the name)
+ *
+ * read and write work as with normal read/write locks - a lock can have
+ * multiple readers, but write excludes reads and other write locks.
+ *
+ * Intent does not block read, but it does block other intent locks. The idea is
+ * by taking an intent lock, you can then later upgrade to a write lock without
+ * dropping your read lock and without deadlocking - because no other thread has
+ * the intent lock and thus no other thread could be trying to take the write
+ * lock.
+ */
+
+union six_lock_state {
+ struct {
+ atomic64_t counter;
+ };
+
+ struct {
+ u64 v;
+ };
+
+ struct {
+ /* for waitlist_bitnr() */
+ unsigned long l;
+ };
+
+ struct {
+ unsigned read_lock:26;
+ unsigned intent_lock:3;
+ unsigned waiters:3;
+ /*
+ * seq works much like in seqlocks: it's incremented every time
+ * we lock and unlock for write.
+ *
+ * If it's odd write lock is held, even unlocked.
+ *
+ * Thus readers can unlock, and then lock again later iff it
+ * hasn't been modified in the meantime.
+ */
+ u32 seq;
+ };
+};
+
+#define SIX_LOCK_MAX_RECURSE ((1 << 3) - 1)
+
+enum six_lock_type {
+ SIX_LOCK_read,
+ SIX_LOCK_intent,
+ SIX_LOCK_write,
+};
+
+struct six_lock {
+ union six_lock_state state;
+ struct task_struct *owner;
+ struct optimistic_spin_queue osq;
+
+ raw_spinlock_t wait_lock;
+ struct list_head wait_list[2];
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+};
+
+static __always_inline void __six_lock_init(struct six_lock *lock,
+ const char *name,
+ struct lock_class_key *key)
+{
+ atomic64_set(&lock->state.counter, 0);
+ raw_spin_lock_init(&lock->wait_lock);
+ INIT_LIST_HEAD(&lock->wait_list[SIX_LOCK_read]);
+ INIT_LIST_HEAD(&lock->wait_list[SIX_LOCK_intent]);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ debug_check_no_locks_freed((void *) lock, sizeof(*lock));
+ lockdep_init_map(&lock->dep_map, name, key, 0);
+#endif
+}
+
+#define six_lock_init(lock) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ __six_lock_init((lock), #lock, &__key); \
+} while (0)
+
+#define __SIX_VAL(field, _v) (((union six_lock_state) { .field = _v }).v)
+
+#ifdef SIX_LOCK_SEPARATE_LOCKFNS
+
+#define __SIX_LOCK(type) \
+bool six_trylock_##type(struct six_lock *); \
+bool six_relock_##type(struct six_lock *, u32); \
+void six_lock_##type(struct six_lock *); \
+void six_unlock_##type(struct six_lock *);
+
+__SIX_LOCK(read)
+__SIX_LOCK(intent)
+__SIX_LOCK(write)
+#undef __SIX_LOCK
+
+#define SIX_LOCK_DISPATCH(type, fn, ...) \
+ switch (type) { \
+ case SIX_LOCK_read: \
+ return fn##_read(__VA_ARGS__); \
+ case SIX_LOCK_intent: \
+ return fn##_intent(__VA_ARGS__); \
+ case SIX_LOCK_write: \
+ return fn##_write(__VA_ARGS__); \
+ default: \
+ BUG(); \
+ }
+
+static inline bool six_trylock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ SIX_LOCK_DISPATCH(type, six_trylock, lock);
+}
+
+static inline bool six_relock_type(struct six_lock *lock, enum six_lock_type type,
+ unsigned seq)
+{
+ SIX_LOCK_DISPATCH(type, six_relock, lock, seq);
+}
+
+static inline void six_lock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ SIX_LOCK_DISPATCH(type, six_lock, lock);
+}
+
+static inline void six_unlock_type(struct six_lock *lock, enum six_lock_type type)
+{
+ SIX_LOCK_DISPATCH(type, six_unlock, lock);
+}
+
+#else
+
+bool six_trylock_type(struct six_lock *, enum six_lock_type);
+bool six_relock_type(struct six_lock *, enum six_lock_type, unsigned);
+void six_lock_type(struct six_lock *, enum six_lock_type);
+void six_unlock_type(struct six_lock *, enum six_lock_type);
+
+#define __SIX_LOCK(type) \
+static __always_inline bool six_trylock_##type(struct six_lock *lock) \
+{ \
+ return six_trylock_type(lock, SIX_LOCK_##type); \
+} \
+ \
+static __always_inline bool six_relock_##type(struct six_lock *lock, u32 seq)\
+{ \
+ return six_relock_type(lock, SIX_LOCK_##type, seq); \
+} \
+ \
+static __always_inline void six_lock_##type(struct six_lock *lock) \
+{ \
+ six_lock_type(lock, SIX_LOCK_##type); \
+} \
+ \
+static __always_inline void six_unlock_##type(struct six_lock *lock) \
+{ \
+ six_unlock_type(lock, SIX_LOCK_##type); \
+}
+
+__SIX_LOCK(read)
+__SIX_LOCK(intent)
+__SIX_LOCK(write)
+#undef __SIX_LOCK
+
+#endif
+
+void six_lock_downgrade(struct six_lock *);
+bool six_lock_tryupgrade(struct six_lock *);
+bool six_trylock_convert(struct six_lock *, enum six_lock_type,
+ enum six_lock_type);
+
+void six_lock_increment(struct six_lock *, enum six_lock_type);
+
+#endif /* _BCACHEFS_SIX_H */
--
2.17.0
^ permalink raw reply related
* Re: [PATCH] regex: do not call `regfree()` if compilation fails
From: Eric Sunshine @ 2018-05-22 2:20 UTC (permalink / raw)
To: Stefan Beller; +Cc: Martin Ågren, git, Johannes Schindelin
In-Reply-To: <CAGZ79kZotwAFauTkCJ6YZ_C-MuaQpNaaS8LCniL_Or=_ccfC4w@mail.gmail.com>
On Mon, May 21, 2018 at 2:43 PM, Stefan Beller <sbeller@google.com> wrote:
> On Sun, May 20, 2018 at 3:50 AM, Martin Ågren <martin.agren@gmail.com> wrote:
>> It is apparently undefined behavior to call `regfree()` on a regex where
>> `regcomp()` failed. [...]
>>
>> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
>> @@ -215,7 +215,6 @@ static void regcomp_or_die(regex_t *regex, const char *needle, int cflags)
>> /* The POSIX.2 people are surely sick */
>> char errbuf[1024];
>> regerror(err, regex, errbuf, 1024);
>> - regfree(regex);
>> die("invalid regex: %s", errbuf);
>
> While the commit message is very clear why we supposedly introduce a leak here,
> it is hard to be found from the source code (as we only delete code
> there, so digging
> for history is not obvious), so maybe
>
> /* regfree(regex) is invalid here */
>
> instead?
The commit message doesn't say that we are supposedly introducing a
leak (and, indeed, no resources should have been allocated to the
'regex' upon failed compile). It's saying that removing this call
potentially avoids a crash under some implementations.
Given that the very next line is die(), and that the function name has
"_or_die" in it, I'm not sure that an in-code comment about regfree()
being invalid upon failed compile would be all that helpful; indeed,
it could be confusing, causing the reader to wonder why that is
significant if we're just dying anyhow. I find that the patch, as is,
clarifies rather than muddles the situation.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.