While working with csv module of python i got very interesting thing about join.
I was reading a huge csv file line by line and for some kind of operation i converted that row to list and again that list to string. But, my row consists of some integer values so i always get
TypeError: sequence item 5: expected string, int found
So, i am writing small code to let new guys know about this.
i have a list and i want to join this.
ls=['a','b',4,'c']
','.join(ls)
ends up with : TypeError: sequence item 2: expected string, int found
do,
','.join(map(str,ls))
Monday, August 30, 2010
Wednesday, August 25, 2010
Dictionary as Generator
What will you do if you are creating dictionary structure dynamically, and it got millions of keys?
Accessing that dictionary later in your code might get some resource. can't it?
I also hanged on this kind of situation and my dictionay got 10K millions key. So i used dictionary as generator to make my work easy.
Folloing code just explain how to use dictionary as generator.
[code]
a=range(100000)
b=range(100000)
c=dict(zip(a,b)) #create dictionary with 100000 keys
d_len=len(c)
d_keys = (k for k in c.keys()) # generator expression
for i in range(d_len):
key = d_keys.next()
.
.
.
## do your operation on keys
[/code]
Accessing that dictionary later in your code might get some resource. can't it?
I also hanged on this kind of situation and my dictionay got 10K millions key. So i used dictionary as generator to make my work easy.
Folloing code just explain how to use dictionary as generator.
[code]
a=range(100000)
b=range(100000)
c=dict(zip(a,b)) #create dictionary with 100000 keys
d_len=len(c)
d_keys = (k for k in c.keys()) # generator expression
for i in range(d_len):
key = d_keys.next()
.
.
.
## do your operation on keys
[/code]
Monday, August 23, 2010
Rename multiple file simultaneously
Renaming multiple file once is really little confusing using command-line. There are lots of way to do it via programming, but yet, i didnt fine any on-the-spot command to do it.
So i used python to do it simply.
My requirement was actually:
1) i have one dedicated folder, where i have to rename all files.
2) all filename to be renamed are structured, i mean, i have to rename all dedupe_<number>.csv to <number>.csv
I did this using following code,
import os
from os import listdir, getcwd, rename
list_files = listdir(getcwd())
for filename in list_files:
if not filename.startswith('.') and 'dedupe_' in filename:
ext = filename.split('.')[-1]
new_name = ''.join(filename.split('.')[:-1]).replace('dedupe_','')+'.'+ext
cmd = 'mv '+filename + ' ' +new_name
os.popen(cmd)
isn't it very very simple !!!!
So i used python to do it simply.
My requirement was actually:
1) i have one dedicated folder, where i have to rename all files.
2) all filename to be renamed are structured, i mean, i have to rename all dedupe_<number>.csv to <number>.csv
I did this using following code,
import os
from os import listdir, getcwd, rename
list_files = listdir(getcwd())
for filename in list_files:
if not filename.startswith('.') and 'dedupe_' in filename:
ext = filename.split('.')[-1]
new_name = ''.join(filename.split('.')[:-1]).replace('dedupe_','')+'.'+ext
cmd = 'mv '+filename + ' ' +new_name
os.popen(cmd)
isn't it very very simple !!!!
Subversion with SSL
I have recently installed SVN to my system, and configured it with SSL. Adding it here might help me further or other people can get helped.
1. Install apache(httpd)
sudo ./configure --prefix=/opt/vivek/apache --enable-dav --enable-so --enable-ssl
## if this gives you error like "configure: error: ...No recognized SSL/TLS toolkit detected" then install
## apt-get install openssl libssl-dev
sudo make
sudo make install
2. Install dependency for subversion (check dependency using sh ./autogen.sh)
1. Install sqlite
2. Get the sqlite 3.6.13 amalgamation from:
http://www.sqlite.org/sqlite-amalgamation-3.6.13.tar.gz
Unpack the archive using tar/gunzip and copy sqlite3.c from the
Resulting directory to:
/home/vivek/Desktop/TGZS/subversion-1.6.12/sqlite-amalgamation/sqlite3.c
This file also ships as part of the subversion-deps distribution.
3. You need autoconf version 2.50 or newer installed (i used synaptic)
4. You need libtool version 1.4 or newer installed
3. Install subversion now.
sudo ./configure --prefix=/opt/vivek/subversion --with-apxs=/opt/vivek/apache/bin/apxs --with-apr=/opt/vivek/apache/bin/apr-1-config --with-apr-util=/opt/vivek/apache/bin/apu-1-config --with-ssl
sudo make
sudo make install
4. after Installation
groupadd svn
useradd -m -d /srv/svn/ -g svn svn
After adding user i go to user and groups and make the user enable(add password 123456)
5.
su - svn (give password of svn user - 123456)
$ mkdir /srv/svn/repositories/
$ mkdir /srv/svn/repositories/myproduct
$ mkdir /srv/svn/conf
$ /opt/vivek/subversion/bin/svnadmin create /srv/svn/repositories/myproduct
6. Add following to apache/conf/httpd.conf, for http access to users
<Location /repos>
DAV svn
SVNParentPath /srv/svn/repositories
# our access control policy
AuthzSVNAccessFile /srv/svn/conf/users-access-file
# try anonymous access first, resort to real
# Authentication if necessary.
Satisfy Any
Require valid-user
# how to authenticate a user
AuthType Basic
AuthName "Subversion repository"
AuthUserFile /srv/svn/conf/passwd
</Location>
CustomLog logs/svn_logfile "%t %u %{SVN-ACTION}e" env=SVN-ACTION
That file, /srv/svn/conf/passwd, can be created using apache/bin/htpasswd:
htpasswd -m -c /srv/svn/conf/passwd vivek (use htpasswd --help first for options)
it will prompt you to password for vivek
** This way you can add user for http access.
Add following to /srv/svn/conf/users-access-file to set permission for user.
[/]
* =
[myproduct:/]
vivek1 = rw
vivek2 = r
run svnserve for required location
/opt/vivek/subversion/bin/svnserve -d -r /srv/svn/repositories/myproduct
7. Now access url http://localhost/repos/myproduct,
8. Add project as
sudo /opt/vivek/subversion/bin/svn import myproduct file:///srv/svn/repositories/myproduct -m "added project"
/opt/vivek/subversion/bin/svn ls svn://localhost/myproduct
9. You can add permission to myproduct folder by changing /srv/svn/repositories/myproduct/conf/passwd and svnserve.conf file.
Add following to svnserve.conf
[general]
anon-access = read
auth-access = write
password-db = passwd
authz-db = authz
# realm = My First Repository
[sasl]
use-sasl = true
Add following to /srv/svn/repositories/myproduct/conf/authz
[groups]
group1 = vivek1
group2 = vivek2
[/]
vivek = rw
*=
[myproduct:/]
@group1 = rw
[myproduct:/]
@group2 = r ## this wont allow user to do svn co or commit
10. If you want to disable credential caching permanently, you can edit your runtime config file (located in /home/vivek/.subversion/config).
[auth]
store-auth-creds = no
Thanks to http://queens.db.toronto.edu/~nilesh/linux/subversion-howto/
Friday, August 20, 2010
Call Python script from Java.
I am not good at core java programming, but good at "Hello World" kind of program :) .
SO i wrote a Java program to call python script(can also pass arg values). Take a look.
This is Java code
import java.io.*;
// run this way
// javac JavaRunCommand.java
// java -classpath . JavaRunCommand
public class JavaRunCommand {
public static void main(String args[]) {
String st = null;
try {
String[]callAndArgs= {"python","my_python.py","arg1","arg2"};
Process p = Runtime.getRuntime().exec(callAndArgs);
BufferedReader stdInput = new BufferedReader(new
InputStreamReader(p.getInputStream()));
BufferedReader stdError = new BufferedReader(new
InputStreamReader(p.getErrorStream()));
// read the output
while ((s = stdInput.readLine()) != null) {
System.out.println(s);
}
// read any errors
while ((s = stdError.readLine()) != null) {
System.out.println(s);
}
System.exit(0);
}
catch (IOException e) {
System.out.println("exception occured");
e.printStackTrace();
System.exit(-1);
}
}
}
In above java code, i am calling my_python.py script. That script might contain anything-wxPython, mod-python, cgi programming, just anything.
SO i wrote a Java program to call python script(can also pass arg values). Take a look.
This is Java code
import java.io.*;
// run this way
// javac JavaRunCommand.java
// java -classpath . JavaRunCommand
public class JavaRunCommand {
public static void main(String args[]) {
String st = null;
try {
String[]callAndArgs= {"python","my_python.py","arg1","arg2"};
Process p = Runtime.getRuntime().exec(callAndArgs);
BufferedReader stdInput = new BufferedReader(new
InputStreamReader(p.getInputStream()));
BufferedReader stdError = new BufferedReader(new
InputStreamReader(p.getErrorStream()));
// read the output
while ((s = stdInput.readLine()) != null) {
System.out.println(s);
}
// read any errors
while ((s = stdError.readLine()) != null) {
System.out.println(s);
}
System.exit(0);
}
catch (IOException e) {
System.out.println("exception occured");
e.printStackTrace();
System.exit(-1);
}
}
}
In above java code, i am calling my_python.py script. That script might contain anything-wxPython, mod-python, cgi programming, just anything.
Thursday, August 19, 2010
optimize your code using dictionary
Today, i was curious to know which method will search a key very fast in dictionary. So i wrote small script which gives me time of execution of some methods.
A programmer can really take advantage to optimize their programe using this script.
import os
import sys
import time
__doc__="""
benchmark script for dict method
"""
a=range(10000)
b=range(9000,10000)
c=dict(zip(a,b))
t1 = time.time()
k = c.keys()
if 98 in k:
t2 = time.time()
print 'if 98 in c.keys() :',(t2-t1)*1000
t1 = time.time()
if c.has_key(98):
t2 = time.time()
print 'c.has_key(98) :',(t2-t1)*1000
t1 = time.time()
if 98 in c:
t2 = time.time()
print 'if 98 in c: ',(t2-t1)*1000
Wednesday, August 18, 2010
Which one is better among ZODB, Pickle and Shelve???
I have worked with all three per my requirement.
I am having very huge data file min 10GB, and this contains user record with a field payments. And user id is repeatitive, i mean there might be 1k users with same user id but having different payments in data file, so i have to do sum operation on this file so that for single user id, there is summation of payments.
I tried three approach here to do my work :
ZODB Approach: I thought ZODB might help me to do this operation, i store user id as key and their payment as value, and basd on that i can update it through loop. But, this only works for small files, for bigger file, it hangs up the system, it forces all memory and CPU to use python. It seems impossible to do other operation while ZODB program runs.
Pickle Approach: What i google about Pickle/cPickle, found it good for object serialization. So i thought lets try it for my work. I implement the same algo(ZODB Approach key-value), but, Pickle ends with "memory error". My system is having 40GB of RAM and 100GB Swap. And also i found that pickling is taking 98% memory and 40% CPU as average to do my work.
Shelve Approach: Shelve is using Pickle as base but it did my job well. I dont know why Shelve works but not Pickle.I follow the same algo (ZODB Approach way). The only drawbake here i found is, while I/O operation is going on to shelve file it slows down other I/O. It takes 3-4% memory and 1% CPU uses as average.
So, my choice is to Shelve for bigger file but ZODB/Pickle rocks for data files having size less than 4 GB.
I am having very huge data file min 10GB, and this contains user record with a field payments. And user id is repeatitive, i mean there might be 1k users with same user id but having different payments in data file, so i have to do sum operation on this file so that for single user id, there is summation of payments.
I tried three approach here to do my work :
ZODB Approach: I thought ZODB might help me to do this operation, i store user id as key and their payment as value, and basd on that i can update it through loop. But, this only works for small files, for bigger file, it hangs up the system, it forces all memory and CPU to use python. It seems impossible to do other operation while ZODB program runs.
Pickle Approach: What i google about Pickle/cPickle, found it good for object serialization. So i thought lets try it for my work. I implement the same algo(ZODB Approach key-value), but, Pickle ends with "memory error". My system is having 40GB of RAM and 100GB Swap. And also i found that pickling is taking 98% memory and 40% CPU as average to do my work.
Shelve Approach: Shelve is using Pickle as base but it did my job well. I dont know why Shelve works but not Pickle.I follow the same algo (ZODB Approach way). The only drawbake here i found is, while I/O operation is going on to shelve file it slows down other I/O. It takes 3-4% memory and 1% CPU uses as average.
So, my choice is to Shelve for bigger file but ZODB/Pickle rocks for data files having size less than 4 GB.
Tuesday, August 17, 2010
Form submit using Twill
Recently, i found excellent use of Twill module of python. I have used twill before, just to check multiple login functionality of one of my plone site. That was to check load on my login script.
But, some days back, i used Twill to fetch multiple user detail from proxy site.
Here, i am just going to explain the little code, which i wrote. It might be helpful for others.
What i am doing here??
i am opening google.com and search the term "Twill"
Here, the first thing is, how to use twill module in python code?
download Twill from http://twill.idyll.org/
import twill
import twill.commands
t_com = twill.commands
## get the default browser
t_brw = t_com.get_browser()
## open the url
url = 'http://google.com'
t_brw.go(url)
## get all forms from that URL
all_forms = t_brw.get_all_forms() ## this returns list of form objects
## now, you have to choose only that form, which is having POST method
for each_frm in all_forms:
attr = each_frm.attrs ## all attributes of form
if each_frm.method == 'POST':
ctrl = each_frm.controls ## return all control objects within that form (all html tags as control inside form)
for ct in ctrl:
if ct.type == 'text': ## i did it as per my use, you can put your condition here
ct._value = "twill"
t_brw.clicked(each_frm,ct.attrs['name']) ## clicked takes two parameter, form object and button name to be clicked.
t_brw.submit()
## you might write the output (submitted page) to any file using content = t_brw.get_html()
## dont forget to reset the browser and putputs.
t_com.reset_browser
t_com.reset_output
Subscribe to:
Posts (Atom)