Wednesday, August 6, 2008

c/c++: embed binary data into elf v.2

In previos post I've described how to embed data into object.
The other opprotunity is to store data in the c/c++ array.
Again, I'll use data.txt:

$cat data.txt 
data file
To create a source file with this data I'll use xxd utility:
xxd -i data.txt data.c
$cat data.c
unsigned char data_txt[] = {
  0x64, 0x61, 0x74, 0x61, 0x20, 0x66, 0x69, 0x6c, 0x65, 0x0a
};
unsigned int data_txt_len = 10;

Simple c source file to use this array will look like:
#include <stdio.h>

extern unsigned char data_txt[];
extern unsigned int data_txt_len;

int
main(int argc, char **argv)
{
    printf("%d", data_txt_len);
    printf("%s", data_txt);

    return 0;
}
To compile
gcc test.c data.c

c/c++: embed binary data into elf

It's great idea when you store program data somewhere outside the binary.
It can be modified for changing program's behaivior or for rebranding.

But sometimes you want to keep some data immutable, hidden into executable binary.

This can be help sections. If you don't want to have smth like

void
usage (status)
     int status;
{
  fprintf (status ? stderr : stdout, "\
Usage: %s [-nV] [--quiet] [--silent] [--version] [-e script]\n\
        [-f script-file] [--expression=script] [--file=script-file] [file...]\n",
       myname);
  exit (status);
}
and don't want this help section be stored in the separate file.You can simply embed binary data into your executable.

Consider you have data.txt:
$cat data.txt 
data file
You have to convert it to elf.
I know two ways:
  • use linker:

    ld -r -b binary -o data.o data.txt
  • use objcopy:

    objcopy -I binary -O elf32-i386 --binary-architecture i386 data.txt data.o

Both of these commands produce elf:
$readelf -a data.o 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          96 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         5
  Section header string table index: 2

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .data             PROGBITS        00000000 000034 00000a 00  WA  0   0  1
  [ 2] .shstrtab         STRTAB          00000000 00003e 000021 00      0   0  1
  [ 3] .symtab           SYMTAB          00000000 000128 000050 10      4   2  4
  [ 4] .strtab           STRTAB          00000000 000178 000043 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Symbol table '.symtab' contains 5 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 SECTION LOCAL  DEFAULT    1 
     2: 00000000     0 NOTYPE  GLOBAL DEFAULT    1 _binary_data_txt_start
     3: 0000000a     0 NOTYPE  GLOBAL DEFAULT    1 _binary_data_txt_end
     4: 0000000a     0 NOTYPE  GLOBAL DEFAULT  ABS _binary_data_txt_size

_binary_data_txt_size and _binary_data_txt_end contain 
Ok, you have data.o with your data in .data section and three symbols: _binary_data_txt_start, _binary_data_txt_end, _binary_data_txt_size

_binary_data_txt_end and _binary_data_txt_size have the same value here. So I'll use _binary_data_txt_size only.
Let's make a simple c program to use data from the object. It's a bit tricky.
#include <stdio.h>

extern int _binary_data_txt_start;
extern int _binary_data_txt_size;

int
main(int argc, char **argv)
{
    int size = (int)&_binary_data_txt_size;
    char *data = (char *)&_binary_data_txt_start;
    
    printf("%d", size);
    printf("%s", data);

    return 0;
}
_binary_data_txt_start and _binary_data_txt_size contain values in their addresses. So &_binary_data_txt_size contains not an address of the symbol but actually value of the symbol that holds the size of the data and &_binary_data_txt_start contains address of the data.

To compile

gcc test.c data.o

VM networking: QEMU and VMware

Sometimes you have to work with qemu and VMware virtual machines at the same time. Moreover you want these machines be visible to each other over the network.

To set up shared network environment for qemu and vmware you should prepare kernel to support TUN/TAP interfaces and bridge interfaces:
Enable TUN/TAP support:

Device Drivers  --->
   Networking support  --->
      <*> Universal TUN/TAP device driver support
Networking  --->
   Networking options  --->
      <*> 802.1d Ethernet Bridging #NOTE : at least for 2.6.20 series
Ensure that you have /dev/net/tun char device and it's writable and readable for qemu user.

Start vmnet(usually vmnet8) interface.

Set vmnet8 in promisc mode:
ifconfig vmnet8 promisc
Setup bridge interface:
brctl addbr br0
Add vmnet8 interface to the bridge:
brctl addif br0 vmnet8
Run vmware VM.

Create file /etc/qemu-ifup with:
#!/bin/sh
sudo /etc/qemu-ifup-sudo $@
Create file /etc/qemu-ifup-sudo with:
#!/bin/sh
/sbin/ifconfig $1 0.0.0.0 promisc up
/usr/sbin/brctl addif br0 $1
Make them executable and add qemu user to /etc/sudoers to run /etc/qemu-ifup-sudo in proper way.

Run qemu VM:
qemu -hda linux.img -net nic,macaddr=52:54:00:12:34:57 -net tap
For every new qemu VM instance you must set different macaddr!

source fetcher

Recently I've faced a problem with updating docs/code examples on libdodo's site. Each time I publish release I have to update docs and code examples. It takes time for formatting pages for web, updating each page and so on. I decided to write a wordpress plugin that fetches sources from the mercurial repository, formats the code and puts it on the wordpress page. The code you can find at source fetcher google code page. You have to install it to 'plugins' directory in the wordpress tree and edit two settings: URL to the repository and tag/revision. You can browse the results on the libdodo examples page.

Tuesday, August 5, 2008

grep: locale

I've spent almost an hour with "grep -RIE 'class [a-z]+[A-Z]+' *h" trying to find classes which have names beginning with lowercase letter and contain capitals.
That grep command ignored case and I got classes with all lowercase letters also.
I've dug into the man pages and found the next paragraph:

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.
Yes, "LC_ALL=C grep -RIE 'class [a-z]+[A-Z]+' *h" worked for me but I didn't expect such behaivior with UTF-8 locale.

Googling a bit I've found some pages contain:
  • Collating symbols. These look like [.element.], where element is a collating element (i.e. a symbolic name for a multi-character string), and match the value of the collating element in the current locale. This doesn't seem to work in GNU grep.
  • On some locales it might include both the uppercase and lowercase of a given character. In the POSIX locale, this always expands to only the character given. 
So '[A-Z]'  is only A,B,C,...,Z for POSIX/C locale.

Monday, August 4, 2008

c++: overriding virtualization

This is a well known technique for me but just recently I heard it was called as 'overriding virtualization'. That's why I decided to expose how calling method from the explicitly mentioned class can override virtualization in classes. Consider you have

class A
{
    virtual void m();
};

class B : public A
{
    virtual void m();
};
When you create instance of class B, you will call B::m in all cases:
A *a = new B;
a->m();/// B::m here
So, if you want to call m from A? Easy:
A *a = new B;
a->A::m();/// A::m here
In the example above I've overridden the virtual call.