Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Wednesday, September 29, 2010

Elementary Analysis of TinyPython Virtual Machine v1.1

OVERVIEW
TinyPython is a minimalist implementation of Python in 64K
code, originally created by Phil Hassey. He blogs at:

It is a parser and byte-code compiler written in
TinyPython itself. It is also fully bootstrapped in the sense that
initially, TinyPy converts a Python script (.py) into a special
TinyPy byte-code format (.tpc), and this generated code is then
passed into a subset of the TinyPython source code called the
Virtual Machine, where the actual execution takes place.

One can even extend the idea that if the VM is compiled into a
low-level format adaptable to a particular microcontroller,
then the VM will reside inside that chip, and any .tpc file can
be downloaded into the chip as its input.

A basic analysis of source code for TinyPy v1.1, excluding the
byte-code compilation phase, was performed. The focus was
on the working of the TinyPy Virtual Machine, which actually
executes the byte-code.

The TinyPython source code used was downloaded from:

BUILDING TINYPY
Initially, the file listing will look like the following figure:
Fig 1. TinyPy source file contents

   Initially, the 'build' folder will be empty. The 'doc',
'examples' and 'modules' folder may or may not contain
any documents, Python scripts and batteries (or modules)
respectively, depending on the downloaded package.
The LICENSE.txt, README.txt, CHANGES.txt and
ROADMAP.txt describe the essential details of TinyPy.
The 'setup.py' contain the initial code to build the TinyPy
from scratch. At the terminal, type as follows:
    python setup.py linux

It is implied that you need a Python interpreter available in 
your system. The 'linux' option is to specify that the code will
be compiled so as to make it work in a Linux environment. 

Now, a new executable 'tinypy' will appear in the 'build' folder.
To fully bootstrap and test TinyPy, give the command as:
    python setup.py linux boot

Now, in the 'tinypy' folder shown above, two new 
executables, 'tinypy' and 'vm' will appear, which are the 
TinyPy parser, and Virtual Machine respectively. It can be 
noticed that new .pyc files for the corresponding .py 
files have also been generated by the Python interpreter.

The general usage and some options available are listed below:
    python setup.py command [option] [module]

64 k - build a a 64k version of TinyPy source code
blob - build a single 'tinypy.c' and 'tinypy.h'



The 'py2bc.py' script is used to convert a user-generated 
Python script into its corresponding .tpc file.
    python py2bc.py sample.py sample.tpc


Here, tinypy_path is the path (relative to current position) 
of either the tinypy executable in the 'build' folder, or the one
in the 'tinypy' folder. 'sample.py' is the name of the user-script.
'sample.tpc' is the name given for the byte-code converted file.


Finally, the generated byte-code (.tpc) is to be passed into the
VM for compilation and execution. Assuming the current 
directory as 'tinypy' folder, it is done as:
    vm sample.tpc

Or logically,

    gcc vmmain.c -lm ./a.out sample.tpc

The 'vmmain.c', present in the 'tinypy' folder, is the main 

function of the VM which runs and automatically links all the
other files necessary for the VM. And now the output is
obtained and displayed. For a better picture, the files actually
needed for VM are:
Fig 2. The files needed for the VM

For debugging and understanding the flow of control within
the source code, the 'gdb' and 'ctags' tools were used.
ANALYZING THE VM
A sample program 'sample.py':
    def add(a, b):
        c = a + b
        return(c)
    print(add(1, 9))
gives you the output:
      10

Each time the VM is invoked, a new instance of the VM is
created .
    tp_vm *tp = tp_init(argc, argv);

"Each object in tinypy is stored as a union in the C API."
tp_obj is tinypy's object representation.
typedef union tp_obj
{
    int type;
    tp_number_ number;
    struct{int type;int *data;}gci;
    tp_string_ string;
    tp_dict_ dict;
    tp_list_ list;
    tp_fnc_ fnc;
    tp_data_ data;
}tp_obj;

The field 'type' of the union indicates the type, of the object.
A type value of 1 indicates that a number is being used,
type = 2 indicates object is a string and type = 4 indicates a list.

In the sample program, a function in the VM 'tp_add' does the
job of adding two arguments (numbers, strings, lists) that are
passed to the function as input.
     tp_obj tp_add(TP, tp_obj a, tp_obj b)

The function definition of 'tp_add' contains three arguments:
   • TP -> the VM instance
   • tp_obj a -> the union variable representing the object 'a' in
                        the sample program
   • tp_obj b -> the union variable representing the object 'b' in
                        the sample program

'a' and 'b' are stored as unions and contains fields for types
such as number, strings, lists, etc.

Consider two simple lists in Python:
    a = [1,2,3]
    b = [4,5,6]

The VM uses a function 'tp_extend' which returns a union that
contains the new (extended) list. Its 'type' would be 4 indicating
a list. The 'list' structure variable includes a pointer *val that
points to another structure '_tp_list'.
    typedef struct _tp_list
    {
        int gci;
        tp_obj *items;
        int len;
        int alloc;
    }_tp_list;

The pointer *items as you could see is of type tp_obj and
de-referencing it would give you the union tp_obj. This
union contains a single element of the final list.

To obtain the next element of the final list:
    r.list.val -> items

would give you the address of the union.
Union containing the next element of the list = Address of 
the current union + size of each union.
   Here:
    r.list.val -> items = 0x96ca048 + 10

would give you the union that stores the next element of the
list . The size of the union is 10, in hexadecimal = 16 bytes.
Fig 3. The representation of a list

THE PATH OF 'tp_add'
Fig 4. The backtraced path of 'tp_add'

   While debugging, our sample program and one of the 
input data too, can be spotted as:
Fig 5. Spotting the sample program
Fig 6. Spotting one of the inputs

   The switch() inside 'vm.c' is invoked nine times, one of
which will be the selection of case: 'TP_IADD' and 
subsequent execution of 'tp_add'.
Fig 7. The case: 'TP_IADD'

Another trial debugging was done with input file 'testing.tpc', 
whose source script was 'testing.py' which just added two 
numbers. Only the filename 'testing.tpc' was given to the 
debugger.

The VM snapshot taken while debugging, showing the 
name of the source script 'testing.py' and the function used
'print(a + b)' stored in the fields of the VM data structures,
and the conclusion to this analysis, and more details can be
found in:

Wednesday, September 15, 2010

Memoization

Computers are very fast machines. But the computation time
also depend on the algorithm we choose. It can happen that
if a proper algorithm is not chosen, it may take a very long
time for execution.

On writing a simple recursive fibonacci generator function for
the nth element, fib(n), we will get fine results, but only for low
values of n. This is because fib(n) has exponential complexity.
Such functions grow very fast, so fast that even for small
values of `n' (where `n' is the size of the input), our program
will take a long time to finish.

Memoization is the process by which we speed up the situation.
The underlying principle is to create an associative array, say  
dict, which contain keys = n, and values = fib(n). Each time a
fib(n) is called, the associative array will be checked whether it
contain the key 'n'. If yes, just return dict[n]. Otherwise, create
a new key: value pair for dict. Thus, the number of recursion will
be drastically reduced, with this look-in-the-lookup-table policy.

Ofcourse, memoization is an upgrade to time compplexity, but
will consume space.
The implementation in Python looked like:

def memoized_fib(x):
if dict.has_key(x):
return dict[x]
else:
dict[x] = fib(x)
return dict[x]

def fib(n):
if n == 0: return 0
elif n == 1: return 1
else: return memoized_fib(n-2) + memoized_fib(n-1)

 dict = {0: 0, 1: 1}


In Python, the recursion limit is fixed by the interpreter itself. It can
be found in:
sys.setrecursionlimit(<limit>)

Now, set a very high limit, and run the code. If its high enough, u will
deplete the stack memory size (8 MB) of the C backend of your Python
virtual machine, leading to a segmentation fault there, which is
propagated back to your Python interpreter, crashing Python itself!

Monday, September 6, 2010

Classes in Python

In Python, "classes are objects too". Let me clarify.

Let me define a sample class.
>>> class abc: pass
>>> abc
<class__main__.abc at 0xb76e341c>
It indicates that in Python, classes are not abstract!
They too take memory space.


Lets explore further.
>>> m = abc
>>> m
<class__main__.abc at 0xb76e341c>
>>> m = abc( )
>>> m
<__main__.abc instance at 0xb76e298c>
i.e, in Python, both classes and their objects are concrete.

Now,
>>> abc.p = 10
>>> abc.p
10
Classes can act as objects, as already told above!


Still further,
>>> m = abc()
>>> m.p
10
>>> m.q = 20
>>> abc.q
20
What exactly happened here?


Things to understand:
1. Here, "." works as a search operator 
2. On querying m.p, first checks if there's a p attribute in m
3. If not, automatically finds out type of m, then searches for p there.
4. The p attribute here, actually belongs to abc, not m.

Sunday, September 5, 2010

C and Python: A Language Faceoff

1. Data Types
    Python is a good example for a dynamically typed programming
    language. It means that, there are no need for type declarations. Any data
    can be initialized directly and the corresponding type will be automatically
    assigned and understood by the Python interpreter.
    >>>a = 12
    >>>c = 'hello'
    
    C is a statically typed language. We need to first declare the variable.
    Only then are we allowed to define it. The C compiler will understand
    the type of the data and assign suitable storage space only when we
    declare it.
    $ int a, b=2;
    $ char c = 'h';
    $ a=10;

2. Array Indexing
    The mechanism of array indexing in C is as follows:
    array index is given by 'base address + offset'
    Therefore, how large be the array length, C compiler only needs to add
    the offset to the base address to access the required array location.

    In Python, the situation is different. It is compulsory for its interpreter to
    follow 2 rules:
    i. check whether current array index is out of bound
    ii. access that index location

    In short, its even possible to go out of bound in C. Garbage values in
    that locations will be returned. Thats all. But Python is very
    conservative. You specify an array length, you better stay inside it. No
    bending rules.

    Finally, the outcome. As you can guess, array indexing in C is blazing
    fast. If we work on the same array in Python, we may have to wait.
    Maybe really long.

3. Dynamic Allocations
    On the outset, i take the privilege to assume you know what dynamic
    allocations are.

    They are very troublesome in C. Simply speaking, if you do an  
    malloc(), then use the same pointer to do another malloc(), you will
    lose control of the firstly allocated memory block forever. You can't
    use it anymore. So be wary. Thats all you can do.

    For this matter, Python is awesome. It will take care of such
    unhandled allocations of memory automatically. Its called garbage 
    collection or automatic memory management. We can say that, each
    allocated block has a reference count, which has the value of the
    number of pointers currently pointing to that particular block. This
    count, is incremented or decremented automatically and accurately.
    There is one consequence, though. The compilation time will be 
    non-deterministic because we cant predict when and how Python
    attempts this.

4. Indentation
    In short, indentation is compulsory in Python. The intermezzo coding
    style proposes 4 spaces. While the Google coders are seen to be using
    only 2.
  
    In C, it is just the other way. Here, it is only a matter of user 
    readability. The C compiler doesn't care.

5. Functions
    In Python, all the functions, both built-in and user-defined, are first 
    class. A function is said to be first class, if it can be treated just as a
    variable. It means that, in Python, functions can be passed as
    parameters, and returned too. Coool, eh? If so, how about knowing
    that this concept can be extended to creating higher order functions?
    They are functions which operate on other functions, e.g. integration,
    differentiation. Three built-in higher order functions in Python are
    map, filter and reduce.

    In C, its just the contradictory. Functions are functions, variables are
    variables. No mixing up.

6. Pointers
    They are the most awesome weapon in C. You can do powerful things
    with them, that can be either only dreamed of or very hard to
    accomplish in other languages. But as powerful they are, be very
    careful with the memory manipulations you can do with them, or else
    you are a goner.

    They are not as such implemented in Python. But some similarity can be
    felt in some operations that can be performed. For a brief idea, refer to
    "Pointers" in Python.

7. Assignments in Expressions
    In C its possible to write,
    $ int a, b=1;
    $ if ((a=12)>b)
    $         print (" %d \n", a+b);

    Well, its not possible in Python. Only logical equality (==) can occur
    inside an expression.

Saturday, September 4, 2010

The Shebang Line

Well, this was an Easter Egg for me, so I will directly move on to what I
mean, without keeping you bewildered.

In computing, a shebang refers to the symbol '#!'. If it is used in a script,
it must come in the very beginning of the first line itself. Thus, the
common term shebang line.

The shebang line contains the absolute path of the interpreter binary for
the particular script you are running. On running executable script files,
the shebang will be automatically looked for and used, resulting in a 'smart
script', or so we can say.

The shebang line for Python scripts is,

#! /usr/bin python

As you might be wondering, this weird name comes from an inexact
combination of sharp (#) and bang (!), which are the Unix names of
the two characters. The other names are hashbang, hashpllng,  
poundbang, and crunchbang. Don't ask me... !!!

Interestingly, the shebang was introduced by Dennis Ritchie at Bell
Laboratories (Edition 8). It was then also added to BSD releases from
Berkeley's Computer Science Releases. 

If the absolute path for the interpreter binary remains the same in all the
machines that the script is run, then portability is achieved too.

More Easter Eggs:
1. When portability issues are encountered, the shebang line
    #! /usr/bin/env python
   may provide a solution.
2. The shebang actually represents a magic number in the executable
    file. This magic string is 0x23 0x21. Executable files that do not require
    an interpreter starts with other magic combinations.
3. The shebang line for Perl is,
    #! /usr/bin/perl

Wednesday, September 1, 2010

Python Tutorial, Release 2.7: Abridged (heavily!)

This tutorial is originally written by Guido van Rossum, and edited by Fred
L. Drake, Jr. . It is an excellent one, excluding some specific portions.

I felt the urge to keep a record of all the key points regarding Python, that
i grasped from this tutorial. So here goes..

1. The Python Interpreter
    Interactive mode: Ctrl + D, primary prompt, secondary prompt
    Non-interactive mode: 'python' command, sys.argv[ ]
    /usr/lib/python2.6 or /usr/lib/python3.1
2. Executable Python Script
    #! /usr/bin/ python
    $ chmod +x 
3. The  Startup File
    PYTHONSTARTUP environment variable 
    os.path(), os.environ.get(), execfile(), .join()
    sys.ps1, sys.ps2
4. Basics 
    #, numbers, , docstring (""' or """),
    sequence data types: str, unicode, list, tuple, buffer, xrange
    string (effects of ' and "), list  (don't ever forget slicing!)
    Mechanism of  slicing:
Courtesy: Python Tutorial, Release 2.7
    be wary of short circuit nature (e.g. and)
    if, elif, else, for in , range, break, continue, pass
5. Functions In Python
    def <function_name>(<argument(s)>): body
    call by value, first class, scope
    built-in higher order functions: map, filter, reduce, etc
    default argument values (evaluated only once!)
    keyword arguments: function (arg1, *arg2, **arg3)
                   *arg2  -> receives a tuple, **arg3 -> receives a dictionary
                    positional arguments before keyword arguments
    unpacking argument lists: *list_object or **dictionary_object
    lambda
6. Intermezzo Coding Style
    a, b = 10, 20 is valid (swapping is unimaginably easy!) 
    4-space indentation
    blanklines, comments, docstrings 
    spacing: a = f(1, 2) + g(3, 4) 
    self as name for first method argument
7. Functional Programming Tools
    list comprehension: >>> vec = [2, 4, 6]
                                        >>> 3*x for x in vec if x > 3   
                                        >>> [12, 18]
    del (for lists), push(), pop() and collections.deque (for strings)
    set (use it to eliminate duplicates on the fly !, thanks to Sumith) 
    dictionaries (.append(), .extend(), .keys(), .values(), .items())  
8. Modules 
    import <module_name>, 
    from <module> import <method>
    from <module> import * (not advised)
    the [if__name__ == "__main__":] clause
    PYTHONPATH, sys.path, dir() function, packages 
9. Input and Output
    regular expressions (regex)
    sys.stdin(), sys.stdout()
    repr(), str(), str.format()
    open(), close(), read(), readline(), readlines(), seek(), write()
    pickling: pickle.dump(), pickle.load()  
10. Error Handling
    try: ... except: ..., try: ... except: ... else: ..., try: except: finally: ...,
    try clause: if functions present, tries inside it too
    else clause: to prevent except from catching the error caused
                          by some other piece of code
    finally clause: always executed
                   1. on leaving the try: except:, 
                   2. when error raised by try: not handled by any except:
                   3. when there is an error in the except: itself       
    class MyClass (Exception): pass
    raise Exception (arguments)
    except Exception as inst: (arguments stored in instance.args),
    with open ('','') as :
11. Classes
     class <class_name>[(<baseclass(es)>)]:pass
     Python classes are objects
     both class and its objects are concrete
     no sense in concept of 'declaration'
     each function attribute has at least one argument named 'self'
                                                       by convention 
     __init__, __str__, __eq__, __iter__
     isinstance()
12. Mechanism of for
     this idea can be applied to any iterable in Python
Courtesy: Python Tutorial, Release 2.7

13. Concept of Generators
     yield -> return, THEN FREEZE THEN AND THERE UNTIL next()
Courtesy: Python Tutorial, Release 2.7
     "genex" -> generator expression
                           (e.g. generator in a list comprehension.
                               then replace [] with ())
                           >>> sum(i * i for i in range(0, 10))

Monday, August 30, 2010

"Pointers" in Python

Firstly, I want you to understand that, the term "pointers" used here, is a
misnomer.

In Python, there is no particular data type called a pointer; as in C. Even so,
some operations that we can perform in Python, are very similar to the
structure of a pointer.

Ok, I've been bragging till now about pointers, let me tell you what it is.

Visualize that, a new data type that you create has to be stored in a memory location in the hard disk. This memory location, has an address.
In C, a pointer can be defined whose value is such a memory 
address. Any valid operation done on this pointer, will be 
automatically reflected in the data present in that address. 

Now lets see, where this happens in Python.

Lets create a list and  copy it.
>>> a = [1,2,3]
>>>b = a 
Now lets change an element of a.
>>>a[1] = 10
>>>a
[1,10,3]
Now check the value of b.
>>>b
[1,10,3]

We can see that b has also changed. How did this happen? 

When the list a was created, it was stored in a memory location say 2000, 
and the value 2000  was put in a. When a was assigned to b, it meant that
now b too contain the value 2000. Then we changed the second element of
a, i.e, the data in the address 2000. Therefore, now if we print the value of  
a or b, we will only get the changed list, since both a and b are actually
"pointers" to the memory address 2000.

N. B.
1. Be careful with list assignments!
2. What I discussed above, does not hold for slicing operations on a list.
   When any list is sliced, first a copy of it is created in another memory
    location, and then the operations are performed. Thus, the changes won't
    reflect back in the original list.