sergey senozhatsky's blog: 2012

Thursday, December 27, 2012

stat() nanosecond time resolution

$ stat foo
[..]
Access: 2012-12-27 15:07:08.020388663 +0300
Modify: 2012-12-27 15:07:08.020388663 +0300
Change: 2012-12-27 15:07:08.020388663 +0300

where does nanosecond precision come from?

man 2 stat

           struct stat {
               dev_t     st_dev;     /* ID of device containing file */
               ino_t     st_ino;     /* inode number */
               mode_t    st_mode;    /* protection */
               nlink_t   st_nlink;   /* number of hard links */
               uid_t     st_uid;     /* user ID of owner */
               gid_t     st_gid;     /* group ID of owner */
               dev_t     st_rdev;    /* device ID (if special file) */
               off_t     st_size;    /* total size, in bytes */
               blksize_t st_blksize; /* blocksize for file system I/O */
               blkcnt_t st_blocks; /* number of 512B blocks allocated */
               time_t    st_atime;   /* time of last access */
               time_t    st_mtime;   /* time of last modification */
               time_t    st_ctime;   /* time of last status change */
           };

as we can see, stat struct contains ctime/mtime/atime, which are time_t, so it's quiet
unlikely to store nanosecond resolution.

The note part says:

Since kernel 2.5.48, the stat structure supports nanosecond resolution
for the three file timestamp fields. Glibc exposes the nanosecond component
of each field using names of the form st_atim.tv_nsec if the _BSD_SOURCE
or _SVID_SOURCE feature test macro is defined. These fields are specified in
POSIX.1-2008, and, starting with version 2.12, glibc also exposes these field
names if _POSIX_C_SOURCE is defined with the value 200809L or greater, or
_XOPEN_SOURCE is defined with the value 700 or greater. If none of the
aforementioned macros are defined, then the nanosecond values are exposed
with names of the form st_atimensec. On file systems that do not support
subsecond timestamps, the nanosecond fields are returned with the value 0.

GLIBC defines struct stat as

struct stat
[..]
#if defined __USE_MISC || defined __USE_XOPEN2K8
      /* Nanosecond resolution timestamps are stored in a format
         equivalent to 'struct timespec'. This is the type used
         whenever possible but the Unix namespace rules do not allow the
         identifier 'timespec' to appear in the header.
         Therefore we have to handle the use of this header in strictly
         standard-compliant sources special. */
      struct timespec st_atim;          /* Time of last access. */
      struct timespec st_mtim;          /* Time of last modification. */
      struct timespec st_ctim;          /* Time of last status change. */
# define st_atime st_atim.tv_sec       /* Backward compatibility. */
# define st_mtime st_mtim.tv_sec
# define st_ctime st_ctim.tv_sec
#else
      __time_t st_atime;               /* Time of last access. */
      unsigned long int st_atimensec;    /* Nscecs of last access. */
      __time_t st_mtime;               /* Time of last modification. */
      unsigned long int st_mtimensec;    /* Nsecs of last modification. */
      __time_t st_ctime;               /* Time of last status change. */
      unsigned long int st_ctimensec;    /* Nsecs of last status change. */
#endif
[..]

where MISC or XOPEN2K8 defs are coming from

glibc/include/features.h

#if (_POSIX_C_SOURCE - 0) >= 200809L
# define __USE_XOPEN2K8       1
# undef _ATFILE_SOURCE
# define _ATFILE_SOURCE 1
#endif

or

# if (_XOPEN_SOURCE - 0) >= 600
#   if (_XOPEN_SOURCE - 0) >= 700
#    define __USE_XOPEN2K8     1
#    define __USE_XOPEN2K8XSI 1
#   endif

#if defined _BSD_SOURCE || defined _SVID_SOURCE
# define __USE_MISC    1
#endif

So it seems, that 'if defined __USE_MISC || defined __USE_XOPEN2K8' or
'if defined _BSD_SOURCE' is sufficient. However, the real world code(tm) is a bit more
complicated. Here is a small part of core-utils' stat:

[..]
/* STAT_TIMESPEC (ST, ST_XTIM) is the ST_XTIM member for *ST of type
    struct timespec, if available. If not, then STAT_TIMESPEC_NS (ST,
    ST_XTIM) is the nanosecond component of the ST_XTIM member for *ST,
    if available. ST_XTIM can be st_atim, st_ctim, st_mtim, or st_birthtim
    for access, status change, data modification, or birth (creation)
    time respectively.

    These macros are private to stat-time.h. */
#if defined HAVE_STRUCT_STAT_ST_ATIM_TV_NSEC
# ifdef TYPEOF_STRUCT_STAT_ST_ATIM_IS_STRUCT_TIMESPEC
# define STAT_TIMESPEC(st, st_xtim) ((st)->st_xtim)
# else
# define STAT_TIMESPEC_NS(st, st_xtim) ((st)->st_xtim.tv_nsec)
# endif
#elif defined HAVE_STRUCT_STAT_ST_ATIMESPEC_TV_NSEC
# define STAT_TIMESPEC(st, st_xtim) ((st)->st_xtim##espec)
#elif defined HAVE_STRUCT_STAT_ST_ATIMENSEC
# define STAT_TIMESPEC_NS(st, st_xtim) ((st)->st_xtim##ensec)
#elif defined HAVE_STRUCT_STAT_ST_ATIM_ST__TIM_TV_NSEC
# define STAT_TIMESPEC_NS(st, st_xtim) ((st)->st_xtim.st__tim.tv_nsec)
#endif

/* Return the nanosecond component of *ST's status change time. */
static inline long int
get_stat_ctime_ns (struct stat const *st)
{
# if defined STAT_TIMESPEC
   return STAT_TIMESPEC (st, st_ctim).tv_nsec;
# elif defined STAT_TIMESPEC_NS
   return STAT_TIMESPEC_NS (st, st_ctim);
# else
   return 0;
# endif
}
[..]

Monday, May 28, 2012

GCC vector types

Apparently GCC has vector types for C language. Moreover, GCC 4.7 allows
to use vector generating element while performing operations with vector types.

Suppose we have

typedef int intvect __attribute__ ((vector_size (32)))

intvec a, b = {1,2,3,4,5,6,7,8};

Operation with vector generating element will be:

a = b + 2;

which literally means:

vector a{2,2,2,2,2,2,2,2} + vector b{1,2,3,4,5,6,7,8}

As it often happens, the neat stuff is hidden behind. Part of generated
assembler code:

    movl   $0x1,0x48(%rsp)
    movl   $0x2,0x4c(%rsp)
    movl   $0x3,0x50(%rsp)
    movl   $0x4,0x54(%rsp)
    movl   $0x5,0x58(%rsp)
    movl   $0x6,0x5c(%rsp)
    movl   $0x7,0x60(%rsp)
    movl   $0x8,0x64(%rsp)
    movdqa 0x48(%rsp),%xmm1
    movl   $0x2,-0x38(%rsp)
    movl   $0x2,-0x34(%rsp)
    movl   $0x2,-0x30(%rsp)
    movl   $0x2,-0x2c(%rsp)
    movl   $0x2,-0x28(%rsp)
    movl   $0x2,-0x24(%rsp)
    movl   $0x2,-0x20(%rsp)
    movl   $0x2,-0x1c(%rsp)
    movdqa -0x38(%rsp),%xmm0
    paddd %xmm0,%xmm1
    movdqa 0x58(%rsp),%xmm2
    movl   $0x2,-0x58(%rsp)
    movl   $0x2,-0x54(%rsp)
    movl   $0x2,-0x50(%rsp)
    movl   $0x2,-0x4c(%rsp)
    movl   $0x2,-0x48(%rsp)
    movl   $0x2,-0x44(%rsp)
    movl   $0x2,-0x40(%rsp)
    movl   $0x2,-0x3c(%rsp)

I really like how they utilize CPU's out-of-order and data prefetching features
by filling pipeline with mov-s with high probability of simultaneous execution
instead of several loops.

"The core's ability to execute instructions out of order is a key factor in enabling
parallelism. This feature enables the processor to reorder instructions so that if
one µop is delayed while waiting for data or a contended resource, other µops that
appear later in the program order may proceed. This implies that when one portion
of the pipeline experiences a delay, the delay may be covered by other operations
executing in parallel or by the execution of µops queued up in a buffer."

Good example is glibc's strncmp:

STRNCMP (const char *s1, const char *s2, size_t n)
{
   unsigned char c1 = '\0';
   unsigned char c2 = '\0';

   if (n >= 4)
     {
       size_t n4 = n >> 2;
       do
         {
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
           c1 = (unsigned char) *s1++;
           c2 = (unsigned char) *s2++;
           if (c1 == '\0' || c1 != c2)
             return c1 - c2;
         } while (--n4 > 0);
       n &= 3;
     }

   while (n > 0)
     {
       c1 = (unsigned char) *s1++;
       c2 = (unsigned char) *s2++;
       if (c1 == '\0' || c1 != c2)
         return c1 - c2;
       n--;
     }
   return c1 - c2;
}

Tuesday, April 17, 2012

GCC: new inter-procedural constant propagation pass

GCC 4.7 has "the inter-procedural constant propagation pass" rewritten, which
brought generic function specialization to C world.

Suppose, we have the following example:

void foo(bool f)
{
        char *p;
        if (f) {
                p = (char *) malloc(256);
                if (!p) {
                        printf("Mem alloc error\n");
                        return;
                }
            /* something extremly valuable */
                free(p);
        } else {
                p = (char *) malloc(256);
                if (!p) {
                        printf("Mem alloc error\n");
                        return;
                }
            /* something very important */
                free(p);
        }
}

int main()
{
         foo(TRUE);
         foo(FALSE);
         return 0;
}

GCC now able to produce the following main():

(gdb) disassemble main
<+0>:    push   %rax
<+1>:    callq 0x4005cc <foo.part.0>
<+6>:    xor    %edi,%edi
<+8>:    callq 0x400649 <foo>
<+13>:    xor    %eax,%eax
<+15>:    pop    %rdx
<+16>:    retq

Note, that GCC has actually generated two functions:
-- foo.part.0 at 0x4005cc
-- foo at 0x400649

for each possible arg values: TRUE and FALSE.

Since compiler now changes function call code it has to protect himself from "incorrect"
outside calls, e.g. foo(TRUE) instead of foo.part.0(TRUE)

That's the reason "master" copy of foo() has switch() at the beginning:
(gdb) disassemble foo
<+0>:    test   %dil,%dil
<+3>:    je     0x400653 <foo+10>
<+5>:    jmpq   0x4005cc <foo.part.0>
<+10>:    push   %rbx
<+11>:    mov    $0x40079e,%edi
[...]

Thursday, March 29, 2012

ext4 noacl mount option

Brace your mtabs/fstabs/proc/self/mounts!
The error line makes it pretty clear:

EXT4-fs: Mount option "noacl" will be removed by 3.5
Contact linux-ext4@vger.kernel.org if you think we should keep it.

Saturday, March 24, 2012

gcc 4.7.0

Hm, is that only happens to me that the same kernel built with gcc 4.7.0
is 11 MiBs bigger than the one built with gcc 4.6.0...

Wednesday, March 14, 2012

Josuttis on C++11

Nicolai Josuttis talks about C++ and why C++ is not his favorite programming
language

Nicolai Josuttis: You know, I didn’t follow the standardization process of C++11. At the end of 2008 I looked first into the new standard by comparing the C++98/03 versions of classes, such as pair and vector, with their new versions. I was shocked. I had trouble understanding what I found: “What the hell does && mean in declarations?” So if you ask me about the difference, my first answer is: Everything is different! The way you write simple programs and the way you define complicated classes have changed dramatically. C++11's pair<>, for instance, doubled the number of lines.

Nevertheless, the changes go in the right direction. C++11 consequently focuses on the power of C++ -- performance. However, it still has the drawback of making it even harder for programmers to design good classes. Well, it might not be harder, if you know all you have to know; but it’s harder to know all you have to know, now. To some extent, C++11 is a new language, and my new edition simply reflects this change, covering both the new C++ programming style and new classes.

Please read the full story at informit.com

Friday, January 20, 2012

C++11 ratio

20.10.2 Header synopsis

quote from cppreference

template<

std::intmax_t Num,
std::intmax_t Denom = 1

> class ratio;

The class template std::ratio provides compile-time rational arithmetic support. Each instantiation of this template exactly represents any finite rational number as long as its numerator Num and denominator Denom are representable as compile-time constants of type std::intmax_t.

typedefs:

typedef ratio<1, 1000000000000000000000000> yocto;
typedef ratio<1,    1000000000000000000000> zepto;
typedef ratio<1,       1000000000000000000> atto;
typedef ratio<1,          1000000000000000> femto;
typedef ratio<1,             1000000000000> pico;
typedef ratio<1,                1000000000> nano;
typedef ratio<1,                   1000000> micro;
typedef ratio<1,                      1000> milli;
typedef ratio<1,                       100> centi;
typedef ratio<1,                        10> deci;
typedef ratio<                       10, 1> deca;
typedef ratio<                      100, 1> hecto;
typedef ratio<                     1000, 1> kilo;
typedef ratio<                  1000000, 1> mega;
typedef ratio<               1000000000, 1> giga;
typedef ratio<            1000000000000, 1> tera;
typedef ratio<         1000000000000000, 1> peta;
typedef ratio<      1000000000000000000, 1> exa;
typedef ratio<   1000000000000000000000, 1> zetta;
typedef ratio<1000000000000000000000000, 1> yotta;

Wednesday, January 18, 2012

ReFS

Microsoft talks a bit about ReFS -- a new server side file system with some
ZFS and Btrfs features.

phoronix has some good notes for those who interested in short key
features list:

- Unlike NTFS, Microsoft ReFS does share some common traits with Btrfs. ReFS is copy-on-write, provides integrity checksums / ECC, extended attributes, and B-trees. The Sun/Oracle ZFS file-system also shares most of the same features. The storage engine of ReFS is using B+ trees exclusively compared to normal B-trees in Btrfs, with the difference of the plus variant being records stored at the leaf-level of the tree and keys being within the interior nodes.

- ReFS has similar volume/file/directory size limits to EXT4 and Btrfs.

- At least for Windows 8, Microsoft is not providing any upgrade path from NTFS to ReFS, but requires re-formatting the drive and copying any data. Within Windows 8, ReFS is also not supported as a boot partition or for use on removable media/drives.

- Below are the official "key features of ReFS" as said by Microsoft.

- Metadata integrity with checksums
- Integrity streams providing optional user data integrity
- Allocate on write transactional model for robust disk updates (also known as copy on write)
- Large volume, file and directory sizes
- Storage pooling and virtualization makes file system creation and management easy
- Data striping for performance (bandwidth can be managed) and redundancy for fault tolerance
- Disk scrubbing for protection against latent disk errors
- Resiliency to corruptions with "salvage" for maximum volume availability in all cases
- Shared storage pools across machines for additional failure tolerance and load balancing

- ReFS does not support data de-duplication, copy-on-write snapshots (a ZFS and Btrfs feature, but ReFS snapshots can be done when paired with the Microsoft Storage Spaces), file-level encryption (dropped from NTFS), or compression (Btrfs can now do Gzip, LZO, and Snappy).

- In what may partially help supporting ReFS in Linux and other platforms, Steven Sinofsky of Microsoft says, "data stored on ReFS is accessible through the same file access APIs on clients that are used on any operating system that can access today’s NTFS volumes." The upper-layer engine is nearly the same as what's found in NTFS, but it's the underlying on-disk storage engine and format that's changed with ReFS.
Read the full story

Thursday, January 12, 2012

GnuPG MPILIB

Looks like the linux kernel 3.3 will have GnuPG MPILIB within.

Multiprecision maths library (MPILIB) [N/m/y/?] (NEW) ?

CONFIG_MPILIB:

Multiprecision maths library from GnuPG.
It is used to implement RSA digital signature verification,
which is used by IMA/EVM digital signature extension.

Symbol: MPILIB [=n]
Type : tristate
Prompt: Multiprecision maths library
Defined at lib/Kconfig:288
Location:
-> Library routines
Selected by: DIGSIG [=n] && KEYS [=y]

In-kernel signature checker (DIGSIG) [N/m/y/?] (NEW) ?

CONFIG_DIGSIG:

Digital signature verification. Currently only RSA is supported.
Implementation is done using GnuPG MPI library

Symbol: DIGSIG [=n]
Type : tristate
Prompt: In-kernel signature checker
Defined at lib/Kconfig:305
Depends on: KEYS [=y]
Location:
-> Library routines
Selects: MPILIB [=m]
Selected by: INTEGRITY_DIGSIG [=n] && INTEGRITY [=n] && KEYS [=y]

sergey senozhatsky's blog