sergey senozhatsky's blog

Monday, November 25, 2013

Release of PowerTOP v2.5

We are pleased to announce the release of PowerTOP v2.5.

2.5 saw a number of fixes and improvements.
Thanks to everyone who helped and contributed.

From the top of my head, there are 2 long awaited improvements we've been asked for:
0) powertop --auto-tune
toggle all tunables to good state and exit.

1) show tunables toggle commands
e.g.
echo 'auto' > '/sys/bus/pci/devices/0000:ff:00.0/power/control';

For those of you who are interested in numbers

git diff --stat 99121654d..HEAD

67 files changed, 6116 insertions(+), 3895 deletions(-)

not bad.

release announcement:
https://01.org/powertop/blogs/kristen/2013/powertop-v2.5-released

Sunday, October 20, 2013

what was that...

gosh... 2013 Australian MotoGP round is the most ridiculous and confusing
race I've ever seen. split race with mandatory bike swap, black flags... really?

Friday, September 6, 2013

/proc/$PID/stat overflowed task utime

If you happen to see a bogus task run time (overflowed user time)/cpu usage/etc. in top
or ps output with recent kernels

$ ps aux | grep rcu
root         8 0.0 0.0      0     0 ?        S    12:42   0:00 [rcuc/0]
root         9 0.0 0.0      0     0 ?        S    12:42   0:00 [rcub/0]
root        10 62422329 0.0 0     0 ?        R    12:42 21114581:37 [rcu_preempt]
root        11 0.1 0.0      0     0 ?        S    12:42   0:02 [rcuop/0]
root        12 62422329 0.0 0     0 ?        S    12:42 21114581:35 [rcuop/1]
root        10 62422329 0.0 0     0 ?        R    12:42 21114581:37 [rcu_preempt]

or

cat /proc/10/stat
10 (rcu_preempt) S 2 0 0 0 -1 2129984 0 0 0 0 1844674407370 477 0 0 20 0 1 0 10 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0

then you probably would be interested in cherry picking commit 5a8e01f8fa51f5cbce8f37acc050eb2319d12956
by Stanislaw Gruszka (or just wait for the next stable release).

Sunday, May 26, 2013

snprintf()

Now I know that

The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated.

Saturday, January 12, 2013

gcc: moving to compile time

NOTE: below is gcc specific.

It's always hard to come up with examples, let's say we have a processing item

struct __item {
        int flags;
        int prio;
        void *priv;
};
a number of possible types

#define IT_TYPE_A       (1 << 4)
#define IT_TYPE_B       (1 << 3)
#define IT_TYPE_C       (1 << 2)
#define IT_TYPE_ANY     (1 << 1)

and a usage example:

        struct __item it = {
                .flags = IT_TYPE_ANY | IT_TYPE_C,
                .prio = 0,
                .priv = "cron item processing",
        };
        [..]
        process_item(&it);

Let's say, we deprecate (you know, because of reasons(tm)) IT_TYPE_ANY field.
Now we need to check and correct passed items each time process_item() called,
even for compile-time known flag values, like the above one. In a trivial situation
this might be almost free operation, but it depends.

A solution could be to extract a happy path -- for compile time known
values, and a slow path -- for run-time checks and adjustments.

IOW, we need to hide our original process_item() and define new function

__fortify_function int process_item(struct __item *it)
{
        if (__builtin_constant_p (it->flags)) {
                if ((it->flags & IT_TYPE_ANY) != 0) {
                        __warn_type_any_deprecated();
                        it->flags &= ~IT_TYPE_ANY;
                        it->flags |= IT_TYPE_A;

                        /** something important **/
                }
        }
        return __process_item(it);
}

that'd be able to test and correct passed items. __builtin_constant_p() is a
gcc internal, that returns 1 if compiler can prove that value is correct and known
at compile time (with some exceptions), and __fortify_function is
__extern_always_inline __attribute_artificial__.

We also might be interested in informing programmer about usage of a
deprecated flag value, therefore we define empty function

extern void __attribute__((deprecated)) __warn_type_any_deprecated() {};

Now compiler fires a warning each time it sees __warn_type_any_deprecated():

gcc -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -O2 test.c -o a.out
In file included from test.c:21:0:
item.h: In function ‘process_item’:
item.h:34:4: warning: ‘__warn_type_any_deprecated’ is deprecated (declared at item.h:25) [-Wdeprecated-declarations]

The good thing about this is that for __builtin_constant_p() case compiler will try
to do all checks and flag adjustments behind the scene, producing code directly
for "IT_TYPE_A | IT_TYPE_C" case:

    movl   $0x0,0x4(%rsp)
    movq   $0x4006be,0x8(%rsp)
    movl   $0x14,(%rsp)
    callq 0x400580 <__process_item>

note movl   $0x14,(%rsp) instead of movl   $0x6,(%rsp).

Developer, however, is still able to ignore (or simply miss) warning, so
for radical cases we can fail build process.

sys/cdefs.h header file contains several defines that could be useful

145 #if __GNUC_PREREQ (4,3)
146 # define __warndecl(name, msg) \
147   extern void name (void) __attribute__((__warning__ (msg)))
148 # define __warnattr(msg) __attribute__((__warning__ (msg)))
149 # define __errordecl(name, msg) \
150   extern void name (void) __attribute__((__error__ (msg)))
151 #else
152 # define __warndecl(name, msg) extern void name (void)
153 # define __warnattr(msg)
154 # define __errordecl(name, msg) extern void name (void)
155 #endif

Changing __warn_type_any_deprecated() prototype to

__warndecl (__warn_type_any_deprecated, "\n\tWARNING: TYPE ANY has been deprecated");

will change output to:

gcc -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -O2 test.c -o a.out
In file included from test.c:21:0:
In function ‘process_item’,
    inlined from ‘main’ at test.c:63:14:
item.h:34:30: warning: call to ‘__warn_type_any_deprecated’ declared with attribute warning:
    WARNING: TYPE ANY has been deprecated [enabled by default]

and fail linkage (pretty hard to ignore), because __warndecl() declares function
w/o body:

test.c:(.text.startup+0x1d): undefined reference to `__warn_type_any_deprecated'

Thursday, December 27, 2012

stat() nanosecond time resolution

$ stat foo
[..]
Access: 2012-12-27 15:07:08.020388663 +0300
Modify: 2012-12-27 15:07:08.020388663 +0300
Change: 2012-12-27 15:07:08.020388663 +0300

where does nanosecond precision come from?

man 2 stat

           struct stat {
               dev_t     st_dev;     /* ID of device containing file */
               ino_t     st_ino;     /* inode number */
               mode_t    st_mode;    /* protection */
               nlink_t   st_nlink;   /* number of hard links */
               uid_t     st_uid;     /* user ID of owner */
               gid_t     st_gid;     /* group ID of owner */
               dev_t     st_rdev;    /* device ID (if special file) */
               off_t     st_size;    /* total size, in bytes */
               blksize_t st_blksize; /* blocksize for file system I/O */
               blkcnt_t st_blocks; /* number of 512B blocks allocated */
               time_t    st_atime;   /* time of last access */
               time_t    st_mtime;   /* time of last modification */
               time_t    st_ctime;   /* time of last status change */
           };

as we can see, stat struct contains ctime/mtime/atime, which are time_t, so it's quiet
unlikely to store nanosecond resolution.

The note part says:

Since kernel 2.5.48, the stat structure supports nanosecond resolution
for the three file timestamp fields. Glibc exposes the nanosecond component
of each field using names of the form st_atim.tv_nsec if the _BSD_SOURCE
or _SVID_SOURCE feature test macro is defined. These fields are specified in
POSIX.1-2008, and, starting with version 2.12, glibc also exposes these field
names if _POSIX_C_SOURCE is defined with the value 200809L or greater, or
_XOPEN_SOURCE is defined with the value 700 or greater. If none of the
aforementioned macros are defined, then the nanosecond values are exposed
with names of the form st_atimensec. On file systems that do not support
subsecond timestamps, the nanosecond fields are returned with the value 0.

GLIBC defines struct stat as

struct stat
[..]
#if defined __USE_MISC || defined __USE_XOPEN2K8
      /* Nanosecond resolution timestamps are stored in a format
         equivalent to 'struct timespec'. This is the type used
         whenever possible but the Unix namespace rules do not allow the
         identifier 'timespec' to appear in the header.
         Therefore we have to handle the use of this header in strictly
         standard-compliant sources special. */
      struct timespec st_atim;          /* Time of last access. */
      struct timespec st_mtim;          /* Time of last modification. */
      struct timespec st_ctim;          /* Time of last status change. */
# define st_atime st_atim.tv_sec       /* Backward compatibility. */
# define st_mtime st_mtim.tv_sec
# define st_ctime st_ctim.tv_sec
#else
      __time_t st_atime;               /* Time of last access. */
      unsigned long int st_atimensec;    /* Nscecs of last access. */
      __time_t st_mtime;               /* Time of last modification. */
      unsigned long int st_mtimensec;    /* Nsecs of last modification. */
      __time_t st_ctime;               /* Time of last status change. */
      unsigned long int st_ctimensec;    /* Nsecs of last status change. */
#endif
[..]

where MISC or XOPEN2K8 defs are coming from

glibc/include/features.h

#if (_POSIX_C_SOURCE - 0) >= 200809L
# define __USE_XOPEN2K8       1
# undef _ATFILE_SOURCE
# define _ATFILE_SOURCE 1
#endif

or

# if (_XOPEN_SOURCE - 0) >= 600
#   if (_XOPEN_SOURCE - 0) >= 700
#    define __USE_XOPEN2K8     1
#    define __USE_XOPEN2K8XSI 1
#   endif

#if defined _BSD_SOURCE || defined _SVID_SOURCE
# define __USE_MISC    1
#endif

So it seems, that 'if defined __USE_MISC || defined __USE_XOPEN2K8' or
'if defined _BSD_SOURCE' is sufficient. However, the real world code(tm) is a bit more
complicated. Here is a small part of core-utils' stat:

[..]
/* STAT_TIMESPEC (ST, ST_XTIM) is the ST_XTIM member for *ST of type
    struct timespec, if available. If not, then STAT_TIMESPEC_NS (ST,
    ST_XTIM) is the nanosecond component of the ST_XTIM member for *ST,
    if available. ST_XTIM can be st_atim, st_ctim, st_mtim, or st_birthtim
    for access, status change, data modification, or birth (creation)
    time respectively.

    These macros are private to stat-time.h. */
#if defined HAVE_STRUCT_STAT_ST_ATIM_TV_NSEC
# ifdef TYPEOF_STRUCT_STAT_ST_ATIM_IS_STRUCT_TIMESPEC
# define STAT_TIMESPEC(st, st_xtim) ((st)->st_xtim)
# else
# define STAT_TIMESPEC_NS(st, st_xtim) ((st)->st_xtim.tv_nsec)
# endif
#elif defined HAVE_STRUCT_STAT_ST_ATIMESPEC_TV_NSEC
# define STAT_TIMESPEC(st, st_xtim) ((st)->st_xtim##espec)
#elif defined HAVE_STRUCT_STAT_ST_ATIMENSEC
# define STAT_TIMESPEC_NS(st, st_xtim) ((st)->st_xtim##ensec)
#elif defined HAVE_STRUCT_STAT_ST_ATIM_ST__TIM_TV_NSEC
# define STAT_TIMESPEC_NS(st, st_xtim) ((st)->st_xtim.st__tim.tv_nsec)
#endif

/* Return the nanosecond component of *ST's status change time. */
static inline long int
get_stat_ctime_ns (struct stat const *st)
{
# if defined STAT_TIMESPEC
   return STAT_TIMESPEC (st, st_ctim).tv_nsec;
# elif defined STAT_TIMESPEC_NS
   return STAT_TIMESPEC_NS (st, st_ctim);
# else
   return 0;
# endif
}
[..]

Monday, May 28, 2012

GCC vector types

Apparently GCC has vector types for C language. Moreover, GCC 4.7 allows
to use vector generating element while performing operations with vector types.

Suppose we have

typedef int intvect __attribute__ ((vector_size (32)))

intvec a, b = {1,2,3,4,5,6,7,8};

Operation with vector generating element will be:

a = b + 2;

which literally means:

vector a{2,2,2,2,2,2,2,2} + vector b{1,2,3,4,5,6,7,8}

As it often happens, the neat stuff is hidden behind. Part of generated
assembler code:

    movl   $0x1,0x48(%rsp)
    movl   $0x2,0x4c(%rsp)
    movl   $0x3,0x50(%rsp)
    movl   $0x4,0x54(%rsp)
    movl   $0x5,0x58(%rsp)
    movl   $0x6,0x5c(%rsp)
    movl   $0x7,0x60(%rsp)
    movl   $0x8,0x64(%rsp)
    movdqa 0x48(%rsp),%xmm1
    movl   $0x2,-0x38(%rsp)
    movl   $0x2,-0x34(%rsp)
    movl   $0x2,-0x30(%rsp)
    movl   $0x2,-0x2c(%rsp)
    movl   $0x2,-0x28(%rsp)
    movl   $0x2,-0x24(%rsp)
    movl   $0x2,-0x20(%rsp)
    movl   $0x2,-0x1c(%rsp)
    movdqa -0x38(%rsp),%xmm0
    paddd %xmm0,%xmm1
    movdqa 0x58(%rsp),%xmm2
    movl   $0x2,-0x58(%rsp)
    movl   $0x2,-0x54(%rsp)
    movl   $0x2,-0x50(%rsp)
    movl   $0x2,-0x4c(%rsp)
    movl   $0x2,-0x48(%rsp)
    movl   $0x2,-0x44(%rsp)
    movl   $0x2,-0x40(%rsp)
    movl   $0x2,-0x3c(%rsp)

I really like how they utilize CPU's out-of-order and data prefetching features
by filling pipeline with mov-s with high probability of simultaneous execution
instead of several loops.

"The core's ability to execute instructions out of order is a key factor in enabling
parallelism. This feature enables the processor to reorder instructions so that if
one µop is delayed while waiting for data or a contended resource, other µops that
appear later in the program order may proceed. This implies that when one portion
of the pipeline experiences a delay, the delay may be covered by other operations
executing in parallel or by the execution of µops queued up in a buffer."

Good example is glibc's strncmp:

STRNCMP (const char *s1, const char *s2, size_t n)
{
   unsigned char c1 = '\0';
   unsigned char c2 = '\0';

   if (n >= 4)
     {
       size_t n4 = n >> 2;
       do
         {
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
         } while (--n4 > 0);
       n &= 3;
     }

   while (n > 0)
     {
       c1 = (unsigned char) *s1++;
       c2 = (unsigned char) *s2++;
       if (c1 == '\0' || c1 != c2)
         return c1 - c2;
       n--;
     }
   return c1 - c2;
}

Tuesday, April 17, 2012

GCC: new inter-procedural constant propagation pass

GCC 4.7 has "the inter-procedural constant propagation pass" rewritten, which
brought generic function specialization to C world.

Suppose, we have the following example:

void foo(bool f)
{
        char *p;
        if (f) {
                p = (char *) malloc(256);
                if (!p) {
                        printf("Mem alloc error\n");
                        return;
                }
            /* something extremly valuable */
                free(p);
        } else {
                p = (char *) malloc(256);
                if (!p) {
                        printf("Mem alloc error\n");
                        return;
                }
            /* something very important */
                free(p);
        }
}

int main()
{
         foo(TRUE);
         foo(FALSE);
         return 0;
}

GCC now able to produce the following main():

(gdb) disassemble main
<+0>:    push   %rax
<+1>:    callq 0x4005cc <foo.part.0>
<+6>:    xor    %edi,%edi
<+8>:    callq 0x400649 <foo>
<+13>:    xor    %eax,%eax
<+15>:    pop    %rdx
<+16>:    retq

Note, that GCC has actually generated two functions:
-- foo.part.0 at 0x4005cc
-- foo at 0x400649

for each possible arg values: TRUE and FALSE.

Since compiler now changes function call code it has to protect himself from "incorrect"
outside calls, e.g. foo(TRUE) instead of foo.part.0(TRUE)

That's the reason "master" copy of foo() has switch() at the beginning:
(gdb) disassemble foo
<+0>:    test   %dil,%dil
<+3>:    je     0x400653 <foo+10>
<+5>:    jmpq   0x4005cc <foo.part.0>
<+10>:    push   %rbx
<+11>:    mov    $0x40079e,%edi
[...]

Thursday, March 29, 2012

ext4 noacl mount option

Brace your mtabs/fstabs/proc/self/mounts!
The error line makes it pretty clear:

EXT4-fs: Mount option "noacl" will be removed by 3.5
Contact linux-ext4@vger.kernel.org if you think we should keep it.

Saturday, March 24, 2012

gcc 4.7.0

Hm, is that only happens to me that the same kernel built with gcc 4.7.0
is 11 MiBs bigger than the one built with gcc 4.6.0...

Wednesday, March 14, 2012

Josuttis on C++11

Nicolai Josuttis talks about C++ and why C++ is not his favorite programming
language

Nicolai Josuttis: You know, I didn’t follow the standardization process of C++11. At the end of 2008 I looked first into the new standard by comparing the C++98/03 versions of classes, such as pair and vector, with their new versions. I was shocked. I had trouble understanding what I found: “What the hell does && mean in declarations?” So if you ask me about the difference, my first answer is: Everything is different! The way you write simple programs and the way you define complicated classes have changed dramatically. C++11's pair<>, for instance, doubled the number of lines.

Nevertheless, the changes go in the right direction. C++11 consequently focuses on the power of C++ -- performance. However, it still has the drawback of making it even harder for programmers to design good classes. Well, it might not be harder, if you know all you have to know; but it’s harder to know all you have to know, now. To some extent, C++11 is a new language, and my new edition simply reflects this change, covering both the new C++ programming style and new classes.

Please read the full story at informit.com

Friday, January 20, 2012

C++11 ratio

20.10.2 Header synopsis

quote from cppreference

template<

std::intmax_t Num,
std::intmax_t Denom = 1

> class ratio;

The class template std::ratio provides compile-time rational arithmetic support. Each instantiation of this template exactly represents any finite rational number as long as its numerator Num and denominator Denom are representable as compile-time constants of type std::intmax_t.

typedefs:

typedef ratio<1, 1000000000000000000000000> yocto;
typedef ratio<1,    1000000000000000000000> zepto;
typedef ratio<1,       1000000000000000000> atto;
typedef ratio<1,          1000000000000000> femto;
typedef ratio<1,             1000000000000> pico;
typedef ratio<1,                1000000000> nano;
typedef ratio<1,                   1000000> micro;
typedef ratio<1,                      1000> milli;
typedef ratio<1,                       100> centi;
typedef ratio<1,                        10> deci;
typedef ratio<                       10, 1> deca;
typedef ratio<                      100, 1> hecto;
typedef ratio<                     1000, 1> kilo;
typedef ratio<                  1000000, 1> mega;
typedef ratio<               1000000000, 1> giga;
typedef ratio<            1000000000000, 1> tera;
typedef ratio<         1000000000000000, 1> peta;
typedef ratio<      1000000000000000000, 1> exa;
typedef ratio<   1000000000000000000000, 1> zetta;
typedef ratio<1000000000000000000000000, 1> yotta;

Wednesday, January 18, 2012

ReFS

Microsoft talks a bit about ReFS -- a new server side file system with some
ZFS and Btrfs features.

phoronix has some good notes for those who interested in short key
features list:

- Unlike NTFS, Microsoft ReFS does share some common traits with Btrfs. ReFS is copy-on-write, provides integrity checksums / ECC, extended attributes, and B-trees. The Sun/Oracle ZFS file-system also shares most of the same features. The storage engine of ReFS is using B+ trees exclusively compared to normal B-trees in Btrfs, with the difference of the plus variant being records stored at the leaf-level of the tree and keys being within the interior nodes.

- ReFS has similar volume/file/directory size limits to EXT4 and Btrfs.

- At least for Windows 8, Microsoft is not providing any upgrade path from NTFS to ReFS, but requires re-formatting the drive and copying any data. Within Windows 8, ReFS is also not supported as a boot partition or for use on removable media/drives.

- Below are the official "key features of ReFS" as said by Microsoft.

- Metadata integrity with checksums
- Integrity streams providing optional user data integrity
- Allocate on write transactional model for robust disk updates (also known as copy on write)
- Large volume, file and directory sizes
- Storage pooling and virtualization makes file system creation and management easy
- Data striping for performance (bandwidth can be managed) and redundancy for fault tolerance
- Disk scrubbing for protection against latent disk errors
- Resiliency to corruptions with "salvage" for maximum volume availability in all cases
- Shared storage pools across machines for additional failure tolerance and load balancing

- ReFS does not support data de-duplication, copy-on-write snapshots (a ZFS and Btrfs feature, but ReFS snapshots can be done when paired with the Microsoft Storage Spaces), file-level encryption (dropped from NTFS), or compression (Btrfs can now do Gzip, LZO, and Snappy).

- In what may partially help supporting ReFS in Linux and other platforms, Steven Sinofsky of Microsoft says, "data stored on ReFS is accessible through the same file access APIs on clients that are used on any operating system that can access today’s NTFS volumes." The upper-layer engine is nearly the same as what's found in NTFS, but it's the underlying on-disk storage engine and format that's changed with ReFS.
Read the full story

Thursday, January 12, 2012

GnuPG MPILIB

Looks like the linux kernel 3.3 will have GnuPG MPILIB within.

Multiprecision maths library (MPILIB) [N/m/y/?] (NEW) ?

CONFIG_MPILIB:

Multiprecision maths library from GnuPG.
It is used to implement RSA digital signature verification,
which is used by IMA/EVM digital signature extension.

Symbol: MPILIB [=n]
Type : tristate
Prompt: Multiprecision maths library
Defined at lib/Kconfig:288
Location:
-> Library routines
Selected by: DIGSIG [=n] && KEYS [=y]

In-kernel signature checker (DIGSIG) [N/m/y/?] (NEW) ?

CONFIG_DIGSIG:

Digital signature verification. Currently only RSA is supported.
Implementation is done using GnuPG MPI library

Symbol: DIGSIG [=n]
Type : tristate
Prompt: In-kernel signature checker
Defined at lib/Kconfig:305
Depends on: KEYS [=y]
Location:
-> Library routines
Selects: MPILIB [=m]
Selected by: INTEGRITY_DIGSIG [=n] && INTEGRITY [=n] && KEYS [=y]

Saturday, December 31, 2011

Happy New Year!

image via the mockturtle