post

Using Python to Grab Images From a Web Site

I recently started a contest for a logo design (The link is not to my contest, just an example). Soon I had over 60 entries and I needed an easy way to present these logos to the client in a Power Point presentation. It takes two clicks to get to each image… no good. Thus the following script was created. It should serve as a good tutorial on how to use Python to do some basic web interactions.

#!/usr/bin/python
import urllib
import re

# Change the variables "contest" and "path"
def retrieveImage( contest, path, name ):
    url = "http://99designs.com/contests/" + contest + "/entries/" + name
    urllib.urlretrieve( url, path + name )

if __name__ == '__main__':
    contest = "6999" #The 99designs content number from which you want to extract images
    path = "../tmp/" #The path where you want to store the downloaded images
    url = urllib.urlopen( "http://99designs.com/contests/" + contest + "/feed" )
    url_string = url.read()
    p = re.compile( 'd*.large.w{3,3}' )
    iterator = p.finditer( url_string )
       for match in iterator:
           retrieveImage( contest, path , match.group() )

Comments

  1. btruelove says:

    Overkill IMO. A less trivial example is going to get large and messy quickly. What about when the XML file in on an FTP, or requires authentication, needs some cookie, has to filter downloads by file size, wants to use a proxy, spawn multiple processes and so on? A more apt tool is curl or wget (I’m lazy so I used both). Also, when you keep it at the shell it’s more natural to pull in other shell commands when needed.

    curl -s http://99designs.com/contests/6999/feed | grep -Po “src=”.*(png|jpg)” | grep -o “http.*” | xargs wget -q

  2. Great example. Thanks for your contribution.

  3. Great tip using grep. I was going to use python, too. :) My box doesn’t have the -P option :( but I used egrep to similar effect. Also, the items I needed were in tags so I had to remove those with a sed command. The images were also retrieved from a database and didn’t have an extension so I did a one liner loop to rename those.

    curl -s http://domain.tld/feed | egrep -o “.*” | egrep -o “(http.*)” | sed -e ‘s/]*>//g’
    for f in *; do mv ./”$f” “${f}.jpg”; done

  4. Ah… forgot to add the

    | xargs wget -q

    at the end of the curl,egrep line to do the actual downloading.

Trackbacks

  1. Ramblings says:

    grep instead of python?…

Speak Your Mind

*