Long back we had written an article introducing you to Python. That was fairly basic and just gave you a glimpse of Python. In many of your python applications you might want to interact with the web without a web browser. In this tutorial we will cover some ways to interact with the web → getting data and sending data.
The code has been written in Python 2.6. If you are using Python 3.0, here is one change that you should keep in mind while implementing the code.
In Python 2.6 a simple hello world is this:
print "Hello World"
In Python 3.0 it looks like this:
print("Hello World")
Importing the required Libraries
There are two libraries, urllib and urllib2, to interact with the web.
Here is a simple code illustrating a basic use of the library.
import urllib2
url = "http://levoltz.com"
website = urllib2.urlopen(url)
print website.read()
The code fetches the website “http://levoltz.com” and stores the data as an instance on which we use the read() function to return the data retrieved from the site. Here are the functions:
instance.read() This returns the data retrieved from the site.
instance.info() This returns the HTTP message from the server. It has a lot of useful information in it including cookie info and server type.
instance.geturl() Returns the URL that was requested
instance.getcode() Returns the HTTP status code. (e.g. 200, 303, 505)
Play around with the above functions to get more accustomed to them. Below is an example showing the use of geturl():
import urllib2
url = "http://levoltz" # Try some redirecting website like like tinyurl.com or bit.ly urls
website = urllib2.urlopen(url)
if url == website.geturl():
print "Website not redirected."
else:
print "Website redirected you."
Here you are opening a url in a window and then checking if the url in the current window is what you supplied it with. If a redirection had occurred the url of the current window will not be the same as what you supplied the urlopen() function with.
Let’s do a HTTP POST request now. They’re pretty easy really, but can look a little complicated, so don’t worry. Before you look at the code, you might want to set up a server (or get some webspace) so you can test this out. A little PHP script like below will do the trick:
<?php
echo $_POST['test'];
?>
Note: Since this is a test page and won’t be there on the server for long, I have not used XSS. If you are really bothered use strip_tags().
Now let us introduce a new module for this. This is not necessary, but I like to do it this way. It is always a good practice to import only things that we require and here we require only one function.
import urllib2
from urllib import urlencode # new module and function
url = “http://localhost/test.php”
data = {‘test’:'levoltz’}
# you can add as much info as you want to this dictionary
# “test” is the label for the data, so that PHP script above
# should display “levoltz”.
encoded_data = urlencode(data)
# remember that this is from that imported module, normally you’d
# use this: urllib.urlencode(data) if you used a normal import.
website = urllib2.urlopen(url, encoded_data)
print website.read() # That was pretty easy, right?
HTTP Basic Authentication is a bit tricky. The basic code for opening more advanced things, including HTTP authentication:
import urllib2
url = “http://example.com”
openerDirective1 = …
openerDirective2 = …
opener = urllib2.build_opener(openerDirective1, openerDirective2)
urllib2.install_opener(opener)
website = urllib2.urlopen(url)
Yes, it is very complicated. The “openerDirective”s are basically a way of adding headers to the urlopen requests. There can be more than one openerDirective. Use build_opener() function to build them into an opener and then install it, using install_opener(). After that, you can request a site and it will include the headers that you have specified.
HTTP Basic Authentication header:
authDirective = urllib2.HTTPBasicAuthHandler()
realm = "Webmail"
url = "http://example.com/webmail/"
username = "leethaxxer"
password = "letmein"
authDirective.add_password(realm, url, username, password)
Then, we just build the opener and install it like we did in the basic code. Here:
opener = urllib2.build_opener(authDirective)
urllib2.install_opener(opener)
We will try to cover cookies in Python in another article.For now we will work on Socket Programing in Python and cookies in there.
Socket Programming in Python
Socket programming forms the basics of security and it is a must to learn.
Here is a simple code to start with:
import socket
s = socket.socket()
host = “www.example.com”
port = 80
addr = (host, port)
s.connect(addr)
s.send(“say hi to levoltz”)
print s.recv(1024)
# 1024 is the buffer size, you don’t need to worry about it
# much right now.
s.close()
We have created a socker and connected to www.example.com on the http port 80 and have passed the message ’say hi to levoltz’ and printed out what we received back. It is not a must to close the socket, but it is a good practice.
Play around with this and you will figure better things to send, for instance:
GET /index.html HTTP/1.1rn
Host: www.example.comrn
That’s a simple HTTP GET request, asking for “index.html”.
Here’s a post request:
POST /index.php HTTP/1.1rn
Host: www.example.comrn
Content-Length: 11rn
rn
hello=worldrn
Now let’s add a cookie to a HTTP GET:
GET /index.html HTTP/1.1rn
Host: www.example.comrn
Set-Cookie: hello=worldrn
This is fairly a very basic article to get you started with python web interaction. There are many more socket modes that can be set. Check http://www.amk.ca/python/howto/sockets/ for more.
Related posts:












Blog