Create Your Own Search Engine with Python

The ability to search a specific web site for the page you are looking for is a very useful feature. However, searching can be complicated and providing a good search experience can require knowledge of multiple programming languages. This article will demonstrate a simple search engine including a sample application you can run in your own site. This sample application is also a good introduction to the Python programming language.
This application is a combination of Python, JavaScript, CSS (Cascading Style Sheets), and HTML. You can run this application on any server which supports CGI and has Python installed. This application was tested with Python version 2.5.1. I ran this application with the Apache HTTP server. The JavaScript and style sheets for this page have been tested with Internet Explorer, Firefox, and Safari.
The code in this application is free and is released under the Apache 2.0 license. That means you are welcome to use, copy, and change this code as much as you would like. If you find any bugs, have any comments, or make any improvements I would love to hear from you. There are a couple of other programs needed to run this application. They are all free, but some of them use different licenses. You should make sure to read and understand each license before using a product.

Get the Source Code

You should start by downloading the source code for this sample. It can be found here. Once you have downloaded it you can unzip it to a work directory on your computer.

Other Programs

This program has been designed to run with the Python interpretor. You will need to have Python installed in order to use this program. If you do not already have Python you must download and install it before you run this sample.
This program can be run locally for testing, but it is meant to be run along with an HTTP server. This program will run in any HTTP server which supports CGI, but it has only been testing with the Apache HTTP server.

Run the Sample

Once you have installed Python and the Apache HTTP server you can run this program using the following steps. These steps will generate an HTML document containing search results to the system console. You can pipe this output to a file and open that file in your web browser. You may need to either add the Python executable to your path or indicate the full path to that executable depending on your system configuration.
  1. Unzip the samples archive to a directory on your machine.
  2. Open a command prompt and change directories to the location you unzipped the sample in.
  3. You can run the command python search.py > searchoutput.html to test this sample locally.
This application has been configured to run via the command line interface for easy access and testing. Configuration for a web server will be discussed later in this article.

Core Technologies

This program will use the following core technologies:
  • Python
  • JavaScript
  • Cascading Style Sheets
  • HTML
This application is meant to be a useful sample of a web site search engine. It is also a good introduction to Python, CSS, JavaScript, and HTML. This sample will demonstrate how these three technologies can work together to create a rich and configurable user interface for searching your applications.

Why Python

There are a lot of web scripting languages and tools available. PERL and Ruby come quickly to mind, but there are many more. Python is a dynamically typed object oriented language. In comparison to Java, Python allows you to reassign object types. Python does not require all code to be within an object in the way that Java does. Python can also work more like a traditional scripting language with less object use.
PERL has a specialized syntax which can be difficult to learn and Ruby most commonly relies on the RAILS framework. They are both very popular and this application could have easily been written with either of them. The benefits of PERL vs. Ruby vs. Python have been debated many times and I will not go over them here. This application could have been written in any one of those languages. Python just happens to be the language I was most interested in when I first wrote this code.

How It Works

This application works as a combination of four technologies. Some of the code in this application will run on your server and some will run in the browser. It is important to remember the context in which the code will run when creating it.
search_flow
This sample includes a sample search form named Search.html. You can customize this file as much as you want, but you must make sure that the name of the form controls remains the same. This form specifies an action URL of /cgi-bin/search.py. You may have to change this URL to reflect the location you have placed the search script on your web server. Once the user enters a search terms and presses the search button the data will be sent to the search.py script on the server. This script will take the search terms, do the actual search, and return the search results.
The search results page will be generated based on the SearchResults.html file which must be placed in the same directory as the search.py script. This HTML file contains two special values ${SEARCH_TERMS_GO_HERE} and ${SEARCH_RESULTS_GO_HERE}. These values will be replaced with the search terms and the search results respectively. Each of the search results contains a link to the page where the terms were found and some special information for the JavaScript in each page to use when highlighting the search terms. When the user clicks on one of these links they will get the HTML page containing the search terms with each term highlighted when it appears in the text.
Each page with highlighting enabled must contain a couple of small small code references. Somewhere in the header of each HTML file you must import the JavaScript and CSS files which know how to handle the search. That code looks like this: