sergey senozhatsky's blog: November 2010

Saturday, November 27, 2010

"#39861: Switch module doesn't like subroutine prototypes" part 2

Spent some time playing with perl Switch bug.

Well, let's start from the begging.
The sort of main parsing routine in Switch.pm is
sub filter_blocks, which tries to parse $source and create $text.

failter_blocks calls Text::Balanced::_match_* to distinguish code blocks
from quoted, to validate variables, etc... and that's the point where the
whole thing hits the fan. Text::Balanced::_match_variable call is too
early here, since _match_variable validates $#, $$, $^ and fails to
validate $). In other words, suppose, we are parsing

sub foo($) {
        switch($_[0]) {
                case /ACK/i {
                        return "ACK";
                }
                case /NACK/i {
                        return "NACK";
                }
        }
}

Text::Balanced::_match_variable will return the whole block, since
it doesn't know what to do with $). Yet it works for $$, $,, etc. quite
well. My first solution was change

m{\G\$\s*(?!::)(\d+|[][&`'+*./|,";%=~:?!\@<>()-]|\^[a-z]?)}gci)

to

m{\G\$\)\?\s*(?!::)(\d+|[][&`'+*./|,";%=~:?!\@<>()-]|\^[a-z]?)}gci)

so, we now parse $) correctly. However, I didn't like it much.

What we (IMHO) really should do - is to teach filter_blocks what
subroutine is. I did very simple and general (which may fail for
some sophisticated cases) thing:

diff --git a/Switch.pm b/Switch.pm
index 2189ae0..781bae8 100755
--- a/Switch.pm
+++ b/Switch.pm
@@ -111,6 +111,11 @@ sub filter_blocks
             }
             next component;
         }
+        if ($source =~ m/\G(\s*sub.+)\{/) {
+            $text .= $1;
+            pos $source += length($1);
+            next component;
+        }
         if ($source =~ m/(\G\s*$pod_or_DATA)/gc) {
             $text .= $1;
             next component;

which shifts position in currently parsed $source to avoid wrong
Text::Balanced::_match_variable call on subroutine declaration.

Of course, this is not tested at all, except for my simple script.
Just playing.

posix cpu timers: RCU read-side critical section

POSIX cpu timers were calling find_task_by_vpid in insecure way
(since 4221a9918e38b7494cee341dda7b7b4bb8c04bde which requires
RCU read-side critical section).

Thomas Gleixner wrote:
| We can remove the tasklist_lock while at it. rcu_read_lock is enough.

Patch also replaces thread_group_leader with has_group_leader_pid
in accordance to comment by Oleg Nesterov:

| ... thread_group_leader() check is not relaible without
| tasklist. If we race with de_thread() find_task_by_vpid() can find
| the new leader before it updates its ->group_leader.
|
| perhaps it makes sense to change posix_cpu_timer_create() to use
| has_group_leader_pid() instead, just to make this code not look racy
| and avoid adding new problems.

Thanks to:
    Reviewed-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Stanislaw Gruszka
    Signed-off-by: Thomas Gleixner

BTW, Oleg, in turn, fixed security issue
commit e0a70217107e6f9844628120412cb27bb4cea194
Author: Oleg Nesterov <>
posix-cpu-timers: workaround to suppress the problems with mt exec

---

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 6842eeb..05bb717 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -37,13 +37,13 @@ static int check_clock(const clockid_t which_clock)
     if (pid == 0)
         return 0;

-    read_lock(&tasklist_lock);
+    rcu_read_lock();
     p = find_task_by_vpid(pid);
     if (!p || !(CPUCLOCK_PERTHREAD(which_clock) ?
-           same_thread_group(p, current) : thread_group_leader(p))) {
+           same_thread_group(p, current) : has_group_leader_pid(p))) {
         error = -EINVAL;
     }
-    read_unlock(&tasklist_lock);
+    rcu_read_unlock();

     return error;
}
@@ -390,7 +390,7 @@ int posix_cpu_timer_create(struct k_itimer *new_timer)

     INIT_LIST_HEAD(&new_timer->it.cpu.entry);

-    read_lock(&tasklist_lock);
+    rcu_read_lock();
     if (CPUCLOCK_PERTHREAD(new_timer->it_clock)) {
         if (pid == 0) {
             p = current;
@@ -404,7 +404,7 @@ int posix_cpu_timer_create(struct k_itimer *new_timer)
             p = current->group_leader;
         } else {
             p = find_task_by_vpid(pid);
-            if (p && !thread_group_leader(p))
+            if (p && !has_group_leader_pid(p))
                 p = NULL;
         }
     }
@@ -414,7 +414,7 @@ int posix_cpu_timer_create(struct k_itimer *new_timer)
     } else {
         ret = -EINVAL;
     }
-    read_unlock(&tasklist_lock);
+    rcu_read_unlock();

     return ret;
}

Friday, November 26, 2010

"#39861: Switch module doesn't like subroutine prototypes"

Recently I faced an unexpected perl5's behaviour.
This simple and valid code

#!/usr/bin/perl
use POSIX;
use strict;

use Switch;

sub foo($) {
    switch($_[0]) {
        case /ACK/i {
            return "ACK";
        }
        case /NACK/i {
            return "NACK";
        }
    }
}

1;
failed to execute due to
syntax error at ./test.pl line 8, near ") {"
syntax error at ./test.pl line 15, near "}"
Bareword "case" not allowed while "strict subs" in use at ./test.pl line 12.
Bareword "NACK" not allowed while "strict subs" in use at ./test.pl line 12.
Execution of ./test.pl aborted due to compilation errors.

If we change foo's prototype requirement to expect more than one scalar
parameters, or drop any requirements - everything will cure and work just
as expected. IOW, case with one scalar parameter is kind of special (read broken).

This turned out to be known problem (#39861, according to Perl5's bugzilla )...
Since around 2004. Only one thing remains to write - WTF?

Sunday, November 14, 2010

Eliminate instructions at compile time trick

Vasiliy Kulikov noted a bug in select code and proposed a fix:
[..] struct timeval has padding bytes at the end. This struct is copied to
userspace with these padding bytes uninitialized. This leads to leaking
of contents of kernel stack memory.

--- a/fs/select.c
+++ b/fs/select.c
@@ -306,6 +306,7 @@ static int poll_select_copy_remaining(struct timespec
*end_time, void __user *p,
               rts.tv_sec = rts.tv_nsec = 0;

       if (timeval) {
+             memset(&rtv, 0, sizeof(rtv));
               rtv.tv_sec = rts.tv_sec;
               rtv.tv_usec = rts.tv_nsec / NSEC_PER_USEC;

Andrew Morton noted that
| struct timeval has padding bytes at the end.
On sparc and parisc. On all other architectures this patch is a waste
of cycles.

And came up with this patch:

       if (timeval) {
-             memset(&rtv, 0, sizeof(rtv));
+             if (sizeof(rtv) > sizeof(rtv.tv_sec) + sizeof(rtv.tv_usec))
+                     memset(&rtv, 0, sizeof(rtv));
               rtv.tv_sec = rts.tv_sec;
               rtv.tv_usec = rts.tv_nsec / NSEC_PER_USEC;

The `if' gets eliminated at compile time. With this approach we add
four bytes of text to the sparc64 build and zero bytes of text to the
x86_64 build.

ioprio: rcu protect find_task_by_vpid call

Since commit 4221a9918e38b7494cee341dda7b7b4bb8c04bde
find_task_by_pid_ns call needs to be protected with RCU lock.

Tetsuo Handa wrote:

| Usually tasklist gives enough protection, but if copy_process() fails

| it calls free_pid() lockless and does call_rcu(delayed_put_pid().

| This means, without rcu lock find_pid_ns() can't scan the hash table

| safely.

"Unsafe" find_task_by_pid_ns call may look like this:

Call Trace:

[<ffffffff810656f2>] lockdep_rcu_dereference+0xaa/0xb2
[<ffffffff81053c67>] find_task_by_pid_ns+0x4f/0x68
[<ffffffff81053c9d>] find_task_by_vpid+0x1d/0x1f
[<ffffffff811104e2>] sys_ioprio_get+0x50/0x2da
[<ffffffff81002182>] system_call_fastpath+0x16/0x1b

V2: rcu critical section expanded according to comment
by Paul E. McKenney.

The patch below adds missing rcu in sys_ioprio_{set|get}.

--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -111,12 +111,14 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
        read_lock(&tasklist_lock);
        switch (which) {
                case IOPRIO_WHO_PROCESS:
+                       rcu_read_lock();
                        if (!who)
                                p = current;
                        else
                                p = find_task_by_vpid(who);
                        if (p)
                                ret = set_task_ioprio(p, ioprio);
+                       rcu_read_unlock();
                        break;
                case IOPRIO_WHO_PGRP:
                        if (!who)
@@ -205,12 +207,14 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
        read_lock(&tasklist_lock);
        switch (which) {
                case IOPRIO_WHO_PROCESS:
+                       rcu_read_lock();
                        if (!who)
                                p = current;
                        else
                                p = find_task_by_vpid(who);
                        if (p)
                                ret = get_task_ioprio(p);
+                       rcu_read_unlock();
                        break;
                case IOPRIO_WHO_PGRP:
                        if (!who)

Monday, November 8, 2010

"We sometimes do this trick"

Recently on lkml we had a patch proposal by Don Zickus.

I had a minor nit, because I thought that it does make sense
to simplify this loop

       touch_all_nmi_watchdogs:
       ...
       for_each_present_cpu(cpu) {
               if (per_cpu(watchdog_nmi_touch, cpu) != true)
                       per_cpu(watchdog_nmi_touch, cpu) = true;
       }
to
       for_each_present_cpu(cpu) {
               per_cpu(watchdog_nmi_touch, cpu) = true;
       }

Andrew Morton wrote in responce:

We sometimes do this trick to avoid dirtying lots of cachelines
which already held the correct value. It'll be extra-benefical
when dealing with other CPU's data, I expect.

This is really reasonable. Once again, try to think in opposite
each time you make a decision.

Friday, November 5, 2010

Happy birthday to Me.

Almost as yong as GNU

on photo Stephen Fry

Wednesday, November 3, 2010

"I really do want to do the merge"

Words of wisdom by Linus Torvalds

"..I do feel that actually seeing the merge conflicts really does help me get a feel

for what I'm merging.."

On Sat, Oct 30, 2010 at 6:51 AM, Chris Mason wrote:
>
> There were some minor conflicts with Linus' current tree, so my branch
> is merged with Linus' tree as of this morning.

Gaah. Please don't do this. Unless it's a _really_ messy merge, I
really do want to do the merge. It's fine to have an alternate
pre-merged branch for me to compare against, but please do that
separately.

So what I did was to just instead merge the state before your merge,
and in the process I:

 (a) noticed that your merge was incorrect (you had left around a
unused "error:" label in btrfs_mount()), since I did use your merge as
something to compare against (see above). That label had been removed
in your branch by  commit 0e78340f3c1f, but your merge resurrected it.

 (b) saw just how horribly nasty your writeback_inodes_sb() end result
was, and decided to clean up the estimation of dirty pages in order to
not end up with the function call argument from hell.

Now, it's obviously totally possible that I screwed things up entirely
in the process, but as mentioned elsewhere, I do feel that actually
seeing the merge conflicts really does help me get a feel for what I'm
merging, and what the points of conflict are.

And yes, maybe it's just me showing my insecurities again. I have
various mental hangups, and liking to feel like I know roughly what is
going on is one of them. Doing the merges and looking at the code that
clashes makes me feel like I have some kind of awareness of how things
are interacting in the development process.

linux-btrfs

sergey senozhatsky's blog