pymantra: 2010

Tuesday, November 23, 2010

Extract comments in your python code

Comments in your code are always good for new person to understand the logic and flow of program. I guess, people likes programming more, if they comment properly, same way as i do. Proper commenting will not allow you to read code line by line. The best way to comment is to comment in such a way, you can extract it easily. I do comments in my python code using '##' which makes me easy to extract them from script. Following code extracts my comment:

import os
import sys
## python extract_comment.py filename comment_delimeter
filename = sys.argv[1]
sepr = sys.argv[2]
fl = open(filename,'r')
fl_con = fl.readlines()
for row in fl_con:
    if sepr in row:
        ind = row.find(sepr)
    print row[ind:]

fl.close()

copy the code in file extract_comment.py, filename is the python script from where you need to extract comments and comment_delimeter is '##' in my case.

Tuesday, November 2, 2010

Simple Web Server in python

Recently, I was hanging arround flex codes which calls python script resides on other server through web services. I got confused, Is it a good idea to use web service just to call python script from other server? Why not to use cgi module or mod-python to get the same result as getting through web services?

So i decided to write a simple web server which has some methods to be called as a URL. Got excellent help from
http://fragments.turtlemeat.com/pythonwebserver.php
then, i added some code.

        import string,cgi,time
        from os import curdir, sep
        from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer

        class VivekServer(BaseHTTPRequestHandler):

            def do_GET(self):
            try:
                if self.path == '/fetch':
                self.send_response(200)
                self.send_header('Content-type',        'text/html')
                self.end_headers()
                res = self.wcount()
                self.wfile.write("Number of count for 'anyword' :")
                self.wfile.write(res[0])
                self.wfile.write(" url is :")
                self.wfile.write(res[1])
                return
                if self.path == '/calculate':
                self.send_response(200)
                self.send_header('Content-type',        'text/html')
                self.end_headers()
                res = self.calculate()
                for each in res:
                    self.wfile.write(each)
                    self.wfile.write('\n')
                return

                return

            except IOError:
                self.send_error(404,'File Not Found: %s' % self.path)

            def calculate(self):
            import random
            WORD = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
            data = []
            for i in range(1, 100):
                data.append((random.randrange(0, 1000), random.sample(WORD, len(WORD))[0]))
            return data

            def do_POST(self):
            pass

            def wcount(self):
            from BeautifulSoup import BeautifulSoup as soup
            import urllib2

            htm = 'http://www.anyurl.com'
            html_text = urllib2.urlopen( htm ) .read()
            sp = soup(html_text)
            idea = sp.findAll( "anyword" )
            all_a = [ each.get('href') for each in sp.findAll('a') ]
            num = 0
            for each in all_a:
                if each.find("anyword") > 0:
                num=num+1
            return (idea,num,htm)

        def main():
            try:
            server = HTTPServer(('', 7999), VivekServer)
            print 'Server Started.....'
            server.serve_forever()
            except KeyboardInterrupt:
            print 'Server Ends.....'
            server.socket.close()

        if __name__ == '__main__':
            main()

To run above code, just do, python abovecode.py

open web browser, type url as
http://localhost:7999/fetch
http://localhost:7999/calculate

Friday, September 10, 2010

Split list into number of Pieces

Today, while working with csv files i got into fantastic situation where i have a list of million values and i make iteration on that. So i thought, it would be easy for me if i split list into number of pieces which dont affect my code, memory and CPU. I can also use generator expression to make make my code run faster, but i was very curious to write code to split list into number of pieces(uses little of generator exp).
And the result is here -

   import os,sys

   def split_seq(seq, num_pieces):
        """ split a list into pieces passed as param """
        start = 0
        for i in xrange(num_pieces):
            stop = start + len(seq[i::num_pieces])
            yield seq[start:stop]
            start = stop

   seq = [i for i in range(100)]   ## define your list here
   num_of_pieces = 3
   for seq in split_seq(seq, num_of_pieces):
        print len(seq), '-> ',seq

Friday, September 3, 2010

Dynamicaly open file

Recently i had gone through a situation like to split a 40GB csv file for further processing, into 60 pieces having name/content to be decided dynamically based on a id present in that csv file. That really make me to write some little code to open/write/close file dynamically. I wrote following code to achieve this.

import os, sys
a=range(10)
for each in a:
    s = "fl_%s = open('%s','a')" % (each,each)
    exec s
    exec "fl_%s.write('%s')" % (each,each)
    com = "fl_%s.close()" % (each)
    exec com
    # can also check if file is closed or open by
    # com = "fl_%s.closed" % (each)
    # bool(com) #return true if file is closed else false

Monday, August 30, 2010

Join integer value

While working with csv module of python i got very interesting thing about join.
I was reading a huge csv file line by line and for some kind of operation i converted that row to list and again that list to string. But, my row consists of some integer values so i always get
TypeError: sequence item 5: expected string, int found

So, i am writing small code to let new guys know about this.
i have a list and i want to join this.
ls=['a','b',4,'c']
','.join(ls)
ends up with : TypeError: sequence item 2: expected string, int found

do,
','.join(map(str,ls))

Wednesday, August 25, 2010

Dictionary as Generator

What will you do if you are creating dictionary structure dynamically, and it got millions of keys?
Accessing that dictionary later in your code might get some resource. can't it?
I also hanged on this kind of situation and my dictionay got 10K millions key. So i used dictionary as generator to make my work easy.
Folloing code just explain how to use dictionary as generator.

[code]
a=range(100000)
b=range(100000)
c=dict(zip(a,b)) #create dictionary with 100000 keys
d_len=len(c)
d_keys = (k for k in c.keys())   # generator expression
for i in range(d_len):
   key = d_keys.next()
       .
    .
    .
   ## do your operation on keys
[/code]

Monday, August 23, 2010

Rename multiple file simultaneously

Renaming multiple file once is really little confusing using command-line. There are lots of way to do it via programming, but yet, i didnt fine any on-the-spot command to do it.
So i used python to do it simply.
My requirement was actually:
1) i have one dedicated folder, where i have to rename all files.
2) all filename to be renamed are structured, i mean, i have to rename all dedupe_<number>.csv to <number>.csv

I did this using following code,

import os
from os import listdir, getcwd, rename

list_files = listdir(getcwd())
for filename in list_files:
    if not filename.startswith('.') and 'dedupe_' in filename:
        ext = filename.split('.')[-1]
        new_name = ''.join(filename.split('.')[:-1]).replace('dedupe_','')+'.'+ext
        cmd = 'mv '+filename + ' ' +new_name
        os.popen(cmd)

isn't it very very simple !!!!

Subversion with SSL

I have recently installed SVN to my system, and configured it with SSL. Adding it here might help me further or other people can get helped.

1. Install apache(httpd)

sudo ./configure --prefix=/opt/vivek/apache --enable-dav --enable-so --enable-ssl

## if this gives you error like "configure: error: ...No recognized SSL/TLS toolkit detected" then install

## apt-get install openssl libssl-dev

sudo make

sudo make install

2. Install dependency for subversion (check dependency using sh ./autogen.sh)

1. Install sqlite

2. Get the sqlite 3.6.13 amalgamation from:

http://www.sqlite.org/sqlite-amalgamation-3.6.13.tar.gz

Unpack the archive using tar/gunzip and copy sqlite3.c from the

Resulting directory to:

/home/vivek/Desktop/TGZS/subversion-1.6.12/sqlite-amalgamation/sqlite3.c

This file also ships as part of the subversion-deps distribution.

3. You need autoconf version 2.50 or newer installed (i used synaptic)

4. You need libtool version 1.4 or newer installed

3. Install subversion now.

sudo ./configure --prefix=/opt/vivek/subversion --with-apxs=/opt/vivek/apache/bin/apxs --with-apr=/opt/vivek/apache/bin/apr-1-config --with-apr-util=/opt/vivek/apache/bin/apu-1-config --with-ssl

sudo make

sudo make install

4. after Installation

groupadd svn

useradd -m -d /srv/svn/ -g svn svn

After adding user i go to user and groups and make the user enable(add password 123456)

su - svn (give password of svn user - 123456)

$ mkdir /srv/svn/repositories/

$ mkdir /srv/svn/repositories/myproduct

$ mkdir /srv/svn/conf

$ /opt/vivek/subversion/bin/svnadmin create /srv/svn/repositories/myproduct

6. Add following to apache/conf/httpd.conf, for http access to users

DAV svn

SVNParentPath /srv/svn/repositories

# our access control policy

AuthzSVNAccessFile /srv/svn/conf/users-access-file

# try anonymous access first, resort to real

# Authentication if necessary.

Satisfy Any

Require valid-user

# how to authenticate a user

AuthType Basic

AuthName "Subversion repository"

AuthUserFile /srv/svn/conf/passwd

</Location>

CustomLog logs/svn_logfile "%t %u %{SVN-ACTION}e" env=SVN-ACTION

That file, /srv/svn/conf/passwd, can be created using apache/bin/htpasswd:

htpasswd -m -c /srv/svn/conf/passwd vivek (use htpasswd --help first for options)

it will prompt you to password for vivek

** This way you can add user for http access.

Add following to /srv/svn/conf/users-access-file to set permission for user.

[/]

* =

[myproduct:/]

vivek1 = rw

vivek2 = r

run svnserve for required location

/opt/vivek/subversion/bin/svnserve -d -r /srv/svn/repositories/myproduct

7. Now access url http://localhost/repos/myproduct,

8. Add project as

sudo /opt/vivek/subversion/bin/svn import myproduct file:///srv/svn/repositories/myproduct -m "added project"

/opt/vivek/subversion/bin/svn ls svn://localhost/myproduct

9. You can add permission to myproduct folder by changing /srv/svn/repositories/myproduct/conf/passwd and svnserve.conf file.

Add following to svnserve.conf

[general]

anon-access = read

auth-access = write

password-db = passwd

authz-db = authz

# realm = My First Repository

[sasl]

use-sasl = true

Add following to /srv/svn/repositories/myproduct/conf/authz

[groups]

group1 = vivek1

group2 = vivek2

[/]

vivek = rw

[myproduct:/]

@group1 = rw

[myproduct:/]

@group2 = r ## this wont allow user to do svn co or commit

10. If you want to disable credential caching permanently, you can edit your runtime config file (located in /home/vivek/.subversion/config).

[auth]

store-auth-creds = no

Thanks to http://queens.db.toronto.edu/~nilesh/linux/subversion-howto/

Friday, August 20, 2010

Call Python script from Java.

I am not good at core java programming, but good at "Hello World" kind of program :) .
SO i wrote a Java program to call python script(can also pass arg values). Take a look.

This is Java code
    import java.io.*;

    // run this way
    // javac JavaRunCommand.java
    // java -classpath . JavaRunCommand

    public class JavaRunCommand {

        public static void main(String args[]) {

        String st = null;

        try {

            String[]callAndArgs= {"python","my_python.py","arg1","arg2"};
            Process p = Runtime.getRuntime().exec(callAndArgs);

            BufferedReader stdInput = new BufferedReader(new
                 InputStreamReader(p.getInputStream()));

            BufferedReader stdError = new BufferedReader(new
                 InputStreamReader(p.getErrorStream()));

            // read the output
            while ((s = stdInput.readLine()) != null) {
                System.out.println(s);
            }

            // read any errors
            while ((s = stdError.readLine()) != null) {
                System.out.println(s);
            }

            System.exit(0);
        }
        catch (IOException e) {
            System.out.println("exception occured");
            e.printStackTrace();
            System.exit(-1);
        }
        }
    }

In above java code, i am calling my_python.py script. That script might contain anything-wxPython, mod-python, cgi programming, just anything.

Thursday, August 19, 2010

optimize your code using dictionary

Today, i was curious to know which method will search a key very fast in dictionary. So i wrote small script which gives me time of execution of some methods.

A programmer can really take advantage to optimize their programe using this script.

import os

import sys

import time

__doc__="""

benchmark script for dict method

"""

a=range(10000)

b=range(9000,10000)

c=dict(zip(a,b))

t1 = time.time()

k = c.keys()

if 98 in k:

t2 = time.time()

print 'if 98 in c.keys() :',(t2-t1)*1000

t1 = time.time()

if c.has_key(98):

t2 = time.time()

print 'c.has_key(98) :',(t2-t1)*1000

t1 = time.time()

if 98 in c:

t2 = time.time()

print 'if 98 in c: ',(t2-t1)*1000

Wednesday, August 18, 2010

Which one is better among ZODB, Pickle and Shelve???

I have worked with all three per my requirement.
I am having very huge data file min 10GB, and this contains user record with a field payments. And user id is repeatitive, i mean there might be 1k users with same user id but having different payments in data file, so i have to do sum operation on this file so that for single user id, there is summation of payments.

I tried three approach here to do my work :

ZODB Approach: I thought ZODB might help me to do this operation, i store user id as key and their payment as value, and basd on that i can update it through loop. But, this only works for small files, for bigger file, it hangs up the system, it forces all memory and CPU to use python. It seems impossible to do other operation while ZODB program runs.

Pickle Approach: What i google about Pickle/cPickle, found it good for object serialization. So i thought lets try it for my work. I implement the same algo(ZODB Approach key-value), but, Pickle ends with "memory error". My system is having 40GB of RAM and 100GB Swap. And also i found that pickling is taking 98% memory and 40% CPU as average to do my work.

Shelve Approach: Shelve is using Pickle as base but it did my job well. I dont know why Shelve works but not Pickle.I follow the same algo (ZODB Approach way). The only drawbake here i found is, while I/O operation is going on to shelve file it slows down other I/O. It takes 3-4% memory and 1% CPU uses as average.

So, my choice is to Shelve for bigger file but ZODB/Pickle rocks for data files having size less than 4 GB.

Tuesday, August 17, 2010

Form submit using Twill

Recently, i found excellent use of Twill module of python. I have used twill before, just to check multiple login functionality of one of my plone site. That was to check load on my login script.

But, some days back, i used Twill to fetch multiple user detail from proxy site.

Here, i am just going to explain the little code, which i wrote. It might be helpful for others.

What i am doing here??

i am opening google.com and search the term "Twill"

Here, the first thing is, how to use twill module in python code?

download Twill from http://twill.idyll.org/

import twill

import twill.commands

t_com = twill.commands

## get the default browser

t_brw = t_com.get_browser()

## open the url

url = 'http://google.com'

t_brw.go(url)

## get all forms from that URL

all_forms = t_brw.get_all_forms() ## this returns list of form objects

## now, you have to choose only that form, which is having POST method

for each_frm in all_forms:

attr = each_frm.attrs ## all attributes of form

if each_frm.method == 'POST':

ctrl = each_frm.controls ## return all control objects within that form (all html tags as control inside form)

for ct in ctrl:

if ct.type == 'text': ## i did it as per my use, you can put your condition here

ct._value = "twill"

t_brw.clicked(each_frm,ct.attrs['name']) ## clicked takes two parameter, form object and button name to be clicked.

t_brw.submit()

## you might write the output (submitted page) to any file using content = t_brw.get_html()

## dont forget to reset the browser and putputs.

t_com.reset_browser

t_com.reset_output