Wednesday, January 30, 2008

awesome tuple ('tcpdump','wireshark')

Sometimes you want to see the HTTP session between browser and HTTP server. You may use firebug for firefox, but in this case you are limited to see firefox session... For example you want to capture all packets at localhost on port 8004. You'd use 'tcpdump -i lo -A -s 0 "port 8004"'

  • `-i lo` means to capture packets on the lo interface
  • `-A` means to show the contents of the packets in ASCII
  • `-s 0` means to show the whole packet - not to divide into the parts
  • `port 8004` is a pcap expression
Now you can watch the packets sent to the server and the server's response. Eventually you'll want to save the log and to watch it later in more comfortable conditions. You'd probably want to use 'tcpdump -i lo -A -s 0 "port 8004" -w /tmp/tcp.dump'. `-w /tmp/tcp.dump` tells tcpdump to save it's work as a specially formed log to /tmp/tcp.dump. Now you can relax and update your distribution with wireshark! It's amazing! You can do whatever you want with the log you have gotten previously. Just run wireshark /tmp/tcp.dump and take a tour to the amazing world of network communication ;). Here is a small screenshot: That's all for today about tcpdump and wireshark. I know that's not enough information about that stuff, but anything else you can find in man tcpdump and man wireshark. They are quite big to discuss in the blogline, just dive into their docs to figure out your options.

Thursday, January 24, 2008

What we expect from the new php

Brothers and sisters, every time I touch php I want to say 'yak'! This time php dev-team announced new php features. As I understood php.v6 features. What will we have?

  • GOTO operator. Yes, I didn't misspell: GOTO. All language paradigms, all algorithms architects try to avoid goto except php developers =/. Here how you can use it:
    for ($i = 0; $i < 9; $i++)
    {
            if (true) {
                    break blah;
            }
            echo "not shown";
    blah:
            echo "iteration $i\n";
    }
    They even named it as 'break'. They probably think that 'break' is not 'goto' =). But yes, they BREAK the common sane ;). Developers say: Goto is currently missing in PHP, and although there is a limited use for this construct in some cases it can reduce the amount of code a lot. I should say: It doesn't matter how many code you have. But it does matter how easy you can read and understand it.
  • ifsetor function(?operator?). It shall replace "'condition'?'yes':'now';" construction. Here how you can use it:
    $foo = ifsetor($_GET['foo'], 42);
    As for me that's nonsense. This doesn't make things easier but complicates the language. In python persists such construction for dict: dictobject.get('key', 'defvalue'). This will return 'defvalue' if no value in dict with given 'key' is associated. Besides, this is not a global function or operator. This is a method of the object and, yes, it simplifies the construction "dictobject.has_key('key') and dictobject['key'] or 'defvalue'"
  • php will be unicode. At last.
  • Some functionality will be cleaned up.
  • This is not bad, but how about old-code support? They should think first before implementation of 'new functionality' that will be cleaned up eventually.
  • No more 'safe_mode'. I wonder why php should had been unsafe in some circumstances ;).
  • We remove support for dynamic break levels. That's one of the useful php features. Every time I had to write code in other languages that should break more than one loop I think that php's break feature will be good here. I even implemented it in my 'libdodo' project. The reason of remove: "break $var" doesn't really work. I think they should fix this behavior rather then remove this break behavior.
  • Named function parameters. That's good =). But I don't like the implementation:
    function foo ($a = 42, $b = 43, $c = 44, $d = 45)
    {
            echo "$a $b $c $d\n";
    }
    foo(c => 54, b => 53);
    
    I think there should be "foo($c = 54, $b = 53)" to keep the coding style.
  • Interfaces may specify the __construct() signature.We didn't see a reason why this shouldn't be allowed, but Andi seems to have a reason for it.Genius!
  • name spaces. Good thing. But I wonder why they are named as 'modules'.
+some other "good" and bad things. Full changelog you can find here.

Tuesday, January 22, 2008

c++: 'templated' typedef

In current c++ standard(2003) you can't define typedef that uses template. But you have an opportunity to define a class/structure that has a typedef.

template<typename T>
struct __tt
{   
    typedef std::map<std::string, T> smap;
};
And use it:
__tt<int>::smap m;

c++: function nested inside other functions

Everybody knows that you can't define function inside other function. I want to argue:

int
main(int argc, char **argv)
{
    struct Nested
    {
        static void print()
        {
            std::cout << __func__ << std::endl;
        }
    };
    Nested::print();
    
    return 0;
}
Or you can apply some macros magic:
#define NESTED_FUNCTION_BEGIN(UNIQUE_NAME) \
    struct UNIQUE_NAME { static
#define NESTED_FUNCTION_END };

int
main(int argc, char **argv)
{
    NESTED_FUNCTION_BEGIN(Nested)
    void print()
    {
        std::cout << __func__ << std::endl;
    }
    NESTED_FUNCTION_END

    Nested::print();

    return 0;
}

c++: separate methods from their classes

If you have a list of objects and a list of functions(list of methods of those objects you want to call) sometimes you use a loop where you call all methods of object line by line. That's acceptable method but I want to propose you another one: to have a list of objects and a list of pointers to their methods. Defined class hierarchy.

class One
{
    public:
    virtual void draw0() = 0;
    virtual void draw1() = 0;
};

class Two: public One
{
    public:
    virtual void draw0()
    {
        std::cout << __func__ << std::endl;
    }
    virtual void draw1()
    {
        std::cout << __func__ << std::endl;
    }
};
Define function that will call objects' method by their address.
template<typename T>
void
draw(T *b, void (T::*f)())
{
    (b->*f)();
}
Define two lists: objects and their methods.
    std::list<One *> ls;
    std::list<void (One::*)()> lsm;
Push methods you want to call to the list of methods.
    lsm.push_back(&One::draw0);
    lsm.push_back(&One::draw1);
And push objects to the list of objects.
    Two T0, T1;
    ls.push_back(&T0);
    ls.push_back(&T1);
Now you can just iterate on list of objects and on list of their methods.
    for (std::list<One *>::iterator i(ls.begin()), j(ls.end());i!=j;++i)
        for (std::list<void (One::*)()>::iterator o(lsm.begin()), p(lsm.end());o!=p;++o)
            draw(*i, *o);
Using this example you should get
draw0
draw1
draw0
draw1
at the output.

Sunday, January 20, 2008

importance of documetation

I used to hear from some people that code documentation is very important. Of course I agree with those people but I have some suggestions that might make clear this point of view. Don't document the code. Write good/readable code instead. If you can't read the code w/o documentation that means that the code is bad. And this code should be rewritten. Self-documented code will avoid out-dated documentation. I have some experience with the code that had been changed but documentation hadn't been. I've spent tons of time to find out the bug. You might want to say me that you always update your code comments. But I want you to think about bugfixes, especially about HOT bugfixes. You don't have time to update the documentation - your first priority is to fix the code and to test the changes. You don't think about the documentation. You think about the code. Use clear names for variables/routines. That will help you to understand what some block of code does. Write in English/Russian/Portuguese/etc. instead of c/c++/python/etc. Don't be afraid of long names. Be afraid of names that don't mean anything at all. Document headers/declaration. That's not that bad if declaration of function/class/etc. is outdated. If routine doesn't do what documentation says you might want to see the code and to fix the code or to fix the documentation. Your friend probably won't use it what's not that bad. It avoid possible problems. =) Find some time to update documentation. It's much easier to update the declaration documentation because you usually know what routine does or what it should do. By the way it's much easier to navigate through the declaration documentation than through the code documentation. Usually code documentation is spread in the code and you have to spend some more time to find it out. During documentation update you might eventually discover that you can improve the model of your application/module/class/process/etc. If you maintain any code you should make a habit to write good code and don't document it; to document the declaration; to make some time to update the documentation. I suppose that these three simple habits will make you life easier ;). Be simple. Don't think that documentation can only help and your treatise of the process is correct.

Sunday, January 13, 2008

postgresql fetching data

Every time you want to fetch data from DB you use either PQexecParams or PQexec. Their syntax:

PGresult *PQexecParams(PGconn *conn,
                       const char *command,
                       int nParams,
                       const Oid *paramTypes,
                       const char * const *paramValues,
                       const int *paramLengths,
                       const int *paramFormats,
                       int resultFormat);
and
PGresult *PQexec(PGconn *conn, const char *command);
Yep, the difference is pretty big. PQexecParams has +6 arguments. The most interesting argument that makes SELECT request result different is resultFormat. PQexec doesn't have it. According to PostgreSQL's manual:

if resultFormat is set to zero it'll obtain results in text format, or if it is set to one it will obtain results in binary format.

Great! I want to get data in binary format. It'll be safe to get abyte(aka blob) data. Do you know what does 'obtain in binary format' mean? PosgreSQL manual hazily says:

There is not currently a provision to obtain different result columns in different formats, although that is possible in the underlying protocol.

That means that in 'binary mode' abyte would be binary, numeric would be numeric(int, long, short, ...), date will be returned as long and so on. Yes, I love postgreSQL that I can obtain data in it's native form.

My suggestion is to add resultFormat to PQexec

Tuesday, January 8, 2008

f(read|write) and (pipe|fifo)s

f(read|write) have next syntax: f(read|write)(void *data, size_t size, size_t nmemb, FILE *stream) Each time I used them I wondered what's nmemb for. It's much faster to write size bytes of data once then (size/nmemb) nmemb times. This way I thought until I had to deal with (fifo|pipe)s. You can (read|write) only one byte in one time, so if you want to (read|write) n bytes in one time you have to call f(read|write)(data, 1, 1, stream) n times[even if you call f(read|write)(data, n, 1, stream) you won't (read|write) n bytes]. Then I understood that it's much easier to call f(read|write)(data, 1, n, stream) to (read|write) n bytes in one time. Time is expensive, even if it's not yours but machine's! Save your moneytime!

optimizations are (not) evil

Once upon a time I've heard that optimizations are evil. I don't think that optimizations are evil, but they should have their time and place. Every time you do optimizations during the development you make mistakes. You draw hidden mistakes(+) and draw new(-). My opinion is to perform optimization after the (process|code|project|.*) development. When you try to optimize smth during the development process you might not know if they are needed here. And what are they for? (What|Who) will be affected? And probably these optimizations are useless because in the next optimization stage you decide to use another technology. When the hole system is finished now you can squint your eyes and search for the bottlenecks. Because now you know how the whole system works. When you think you've found the bottleneck ask your friend about it. When you are looking for unoptimized parts you _want_ to find them, even they don't exist. And you'll find a couple, of course then. =) Each time you want to optimize measure machine run_time and your (optimization+bugfixing+coffee)_time. Compare them and make right decision. I've decided to make special post-release stage: post-release-optimization stage. During this stage I can't break the release and I know what current release does. All changes will go to the "-pro" release and to the next release. And now I'm proud to have "-pro" releases =).

Thursday, January 3, 2008

PF_UNIX

The main purpose of using PF_UNIX is local interconnection. Sometimes you need some daemon to listen on the localhost for the incoming connections to perform actions(data bus, server backend, etc.). What's the main advantage of usage PF_UNIX comparably with localhost-only network sockets? There are some reasons you should use unix socket against localhost-only network sockets:

  • Reduce network load. Kernel doesn't need to wrap packets with tcp/ip/ethernet/... layers
  • Higher security.Unix socket is a file in the filesystem. It has access permissions. So you may restrict access to it for all except specified user/group
  • Anonymity.You may call socketpair to create anonymous connected unix sockets.
Why not to use pipes?
  • For two-side interconnection you need two pipes: one for reading, another one for writing.
  • Using socket paradigm you can quickly switch to network sockets if you need.
  • Using socket you can interconnect with arbitrary[system limited] amount of connections easily*
* - you can do that with pipes too using own-defined messaging protocol. It'd would look like datagram socket communication semantics.