Create Your Own Search Engine with Python
The ability to search a specific web site for the page you are looking
for is a very useful feature. However, searching can be complicated and
providing a good search experience can require knowledge of multiple
programming languages. This article will demonstrate a simple search
engine including a sample application you can run in your own site.
This sample application is also a good introduction to the Python
programming language.
This application is a combination of Python, JavaScript, CSS (Cascading Style Sheets), and HTML. You can run this application on any server which supports CGI and has Python installed. This application was tested with Python version 2.5.1. I ran this application with the Apache HTTP server. The JavaScript and style sheets for this page have been tested with Internet Explorer, Firefox, and Safari.
The code in this application is free and is released under the Apache 2.0 license. That means you are welcome to use, copy, and change this code as much as you would like. If you find any bugs, have any comments, or make any improvements I would love to hear from you. There are a couple of other programs needed to run this application. They are all free, but some of them use different licenses. You should make sure to read and understand each license before using a product.
This program can be run locally for testing, but it is meant to be run along with an HTTP server. This program will run in any HTTP server which supports CGI, but it has only been testing with the Apache HTTP server.
PERL has a specialized syntax which can be difficult to learn and Ruby most commonly relies on the RAILS framework. They are both very popular and this application could have easily been written with either of them. The benefits of PERL vs. Ruby vs. Python have been debated many times and I will not go over them here. This application could have been written in any one of those languages. Python just happens to be the language I was most interested in when I first wrote this code.
This sample includes a sample search form named Search.html. You can customize this file as much as you want, but you must make sure that the name of the form controls remains the same. This form specifies an action URL of
The search results page will be generated based on the SearchResults.html file which must be placed in the same directory as the search.py script. This HTML file contains two special values
Each page with highlighting enabled must contain a couple of small small code references. Somewhere in the header of each HTML file you must import the JavaScript and CSS files which know how to handle the search. That code looks like this:
This application is a combination of Python, JavaScript, CSS (Cascading Style Sheets), and HTML. You can run this application on any server which supports CGI and has Python installed. This application was tested with Python version 2.5.1. I ran this application with the Apache HTTP server. The JavaScript and style sheets for this page have been tested with Internet Explorer, Firefox, and Safari.
The code in this application is free and is released under the Apache 2.0 license. That means you are welcome to use, copy, and change this code as much as you would like. If you find any bugs, have any comments, or make any improvements I would love to hear from you. There are a couple of other programs needed to run this application. They are all free, but some of them use different licenses. You should make sure to read and understand each license before using a product.
Get the Source Code
You should start by downloading the source code for this sample. It can be found here. Once you have downloaded it you can unzip it to a work directory on your computer.Other Programs
This program has been designed to run with the Python interpretor. You will need to have Python installed in order to use this program. If you do not already have Python you must download and install it before you run this sample.This program can be run locally for testing, but it is meant to be run along with an HTTP server. This program will run in any HTTP server which supports CGI, but it has only been testing with the Apache HTTP server.
Run the Sample
Once you have installed Python and the Apache HTTP server you can run this program using the following steps. These steps will generate an HTML document containing search results to the system console. You can pipe this output to a file and open that file in your web browser. You may need to either add the Python executable to your path or indicate the full path to that executable depending on your system configuration.- Unzip the samples archive to a directory on your machine.
- Open a command prompt and change directories to the location you unzipped the sample in.
- You can run the command
python search.py > searchoutput.html
to test this sample locally.
Core Technologies
This program will use the following core technologies:- Python
- JavaScript
- Cascading Style Sheets
- HTML
Why Python
There are a lot of web scripting languages and tools available. PERL and Ruby come quickly to mind, but there are many more. Python is a dynamically typed object oriented language. In comparison to Java, Python allows you to reassign object types. Python does not require all code to be within an object in the way that Java does. Python can also work more like a traditional scripting language with less object use.PERL has a specialized syntax which can be difficult to learn and Ruby most commonly relies on the RAILS framework. They are both very popular and this application could have easily been written with either of them. The benefits of PERL vs. Ruby vs. Python have been debated many times and I will not go over them here. This application could have been written in any one of those languages. Python just happens to be the language I was most interested in when I first wrote this code.
How It Works
This application works as a combination of four technologies. Some of the code in this application will run on your server and some will run in the browser. It is important to remember the context in which the code will run when creating it.This sample includes a sample search form named Search.html. You can customize this file as much as you want, but you must make sure that the name of the form controls remains the same. This form specifies an action URL of
/cgi-bin/search.py
. You may have to change
this URL to reflect the location you have placed the search script on
your web server. Once the user enters a search terms and presses the
search button the data will be sent to the search.py script on the
server. This script will take the search terms, do the actual search,
and return the search results.
The search results page will be generated based on the SearchResults.html file which must be placed in the same directory as the search.py script. This HTML file contains two special values
${SEARCH_TERMS_GO_HERE}
and ${SEARCH_RESULTS_GO_HERE}
.
These values will be replaced with the search terms and the search
results respectively. Each of the search results contains a link to the
page where the terms were found and some special information for the
JavaScript in each page to use when highlighting the search terms. When
the user clicks on one of these links they will get the HTML page
containing the search terms with each term highlighted when it appears
in the text.Each page with highlighting enabled must contain a couple of small small code references. Somewhere in the header of each HTML file you must import the JavaScript and CSS files which know how to handle the search. That code looks like this: