What is CAPTCHA and how it works?

If you're here on this site, possibly you're tired to death of unceasing spambot attacks on your site, forum or blog. And yes, this article with a real working example of CAPTCHA PHP script based on TheCAPTCHA can help you to solve the problem and protect your site against spambots.

I. Theory

Spambots are ubiquitous. They fill forms on web sites, send SMS via web-interface, take part in online polls, they just do all those things you expect from a real human. This situation gave a push to security specialists to create a number of protection methods against spambots, and spammers, in their turn, are permanently trying to break the shields.

Human or robot?

To determine whether a site visitor is a human or a robot, he (or it) has to accomplish a task, easy for human and hard or even impossible for robot. These tasks are commonly named CAPTCHA, which is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”. It is named after Professor Alan Turing, who in 1950 in his article “Imitation Game” described a test of a machine's capability to demonstrate intelligence. It proceeds as follows: a human judge engages in a natural language conversation with two other parties, one a human and the other a machine; if the judge cannot reliably tell which is which, then the machine is said to pass the test. It is assumed that both the human and the machine try to appear human.

CAPTCHA is a form of reverse Turing test. When, for example, logging on to a web site, the user is presented with a word or number in a distorted graphic image and asked to enter it. If the value entered does not match what is expected, then the user is rejected. There are various implementations of CAPTCHA, the most popular at the moment are:

  1. Image with random letters and digits, sometimes distorted, sometimes with some noise over the text. A human can recognize distorted text, while a spambot can't.
  2. Image with some random mathematical expression, which has to be calculated by a visitor. For example, if expression is 47+39, a visitor has to put in the form the right answer, which in this case is 86.
  3. An audio file with pronounced random letters or digits, the sound is sometimes distorted. A visitor has to type the heard text in the form.
  4. A random question or a riddle. For example, a visitor has to tell the first name of Shakespeare.
  5. A set of images of various objects, for example cats, dogs and birds. A visitor has to select only objects of the kind that is specified by the test, for example only cats.

Which way is the best? The last and the next to last are the worst, because in order to be effective against spambot attacks you have to acquire a prohibitively huge database of images or questions. The next disadvantage of these methods is that a visitor, if he is a human, has to be intimate with the language of the question. An answer to a riddle can be easily added to spambot's database, but it may be impossible to answer for those who are not native speakers of the language.

Audio-CAPTCHA is also not the best way, because a visitor has to have some audio-equipment on his computer (many workers in big corporations with strict computer policies don't have).

You may think that math CAPTCHA is quite a good idea, because a spambot has to recognize the digits and operators and then to calculate the expression. No man, computers calculate much more faster than humans :)

So, what remains? A graphic CAPTCHA with letters and digits. This is the most easy to implement method, it is quite secure if you follow security policies, and in this article I'll teach you how to create a strong but simple PHP script to protect your web site.

Abordage

Creating a graphic CAPTCHA is always a balance between possibilities of both humans and bots to recognize distorted and noisy text. If text is too much distorted there will be no submissions of forms on your site at all — this is not what we want. On the other hand, a weak CAPTCHA with ineffective distortion and noise or even without any distortion at all is like a toy shield against a viking axe.

How do spambots recognize symbols? This process can be divided into 2 steps:

  1. Definition of place and borders of each symbol.
  2. Then, when above is done, a spambot tries to recognize each symbol.

If symbols are always on the same places with fixed spaces between them, your CAPTCHA is weak, so that's why symbols should always have different coordinates:

An example of weaker an stronger CAPTCHAs

The task is getting much more difficult if symbols on your CAPTCHA have always random position, angle and space. Having done with the definition of place of each symbol, the next step is to compare color of background with the color of what is assumed to be a symbol. If these colors are different (and they must be different to be easily read by human), recognition by spambots is quite easy, so you have either to:

  1. put some noise under and over symbols or
  2. place each symbol so close to each other, that they will form a one box of symbols with no spaces between them, and finally distort the text.

The second method is more powerful but much more complicated to implement at the same time. When there's no idea on where the symbol starts and where it ends, the task becomes almost impossible to accomplish for spambots, though still easy for a human being. However, if the noise you put on your CAPTCHA is of the right kind, the first method can be quite secure from break, and in this article I will use this very method to show you how to make your own CAPTCHA script. The same method is used in my TheCAPTCHA script.

The right noise

There are several types of noise commonly used with CAPTCHA scripts to embarrass the recognition of symbols by spambots:

  1. Pixel noise, sometimes of random color, looks like an old grainy film or 3200 ISO images of your digital camera.
  2. Lines, sometimes of random color and angle, sometimes they form a kind of grid.
  3. Rectangles and/or circles, sometimes filled with color.

The first type is the weakest and there is no sense to use it, as it only makes the symbols difficult to recognize by a human, not by a spambot.

The second and the third are good only when they are random. As you can see, when creating a CAPTCHA you have to forget anything systematic. Any well-ordered structure on an image is a hole in your hauberk. If a creator of a spambot knows that on some particular image there is a regular grid, lines of which have always the same regular angle, he just excludes the grid from the image, thus making this grid useless. The same with rectangles and circles. That's why you need to put this kind of noise under and over the symbols only on irregular basis. And note, that the color of these objects must be either the same as the color of the background or the same as the color of symbols, thus making it harder for spambots to define place and borders of each symbol.

II. Practice

So let's begin with the script. All you need is a http server, PHP 4 or 5 with GD 2 support. The GD library installed on your hosting has to support TTF (True Type) fonts.

Create a folder with name “captcha” under your site's document root, and then create there files “captcha.function.php”, “captcha.image.php” and “form.php”. The first will contain functions to generate a CAPTCHA image and to verify a word provided by a visitor. The second file will just output the image to a browser, when called, and the last file is a form protected against spam with CAPTCHA.

The algorithm is following:

  1. When called with the form, captcha.image.php creates an image with word and stores this word in a session.
  2. A visitor types the word (correctly or not), submits the form, and the form verifies the word.
  3. If the word provided by visitor is the same that the word stored in session and the session is not empty or nonexisting, the form will do what it should do, add a comment to a blog for example. If the word is wrong, the form just will stop processing, assuming that the visitor is a spambot. In any case, if the word is wrong or it's right, this script will destroy the session variable with the word.

Please, note the last sentence. It's a great breach in the wall if the session is not deleted after submit, and if a word of visitor is only compared with one in session. You have to be sure that session variable exists and it's not empty, and you must unset it after every submit.

Functions

We will create two functions in file captcha.function.php: captcha_show_image() and captcha_verify_word(). The first creates an image with word, the second verifies visitor's input. First of all we have to create an image:

<?
function captcha_show_image() {
// Let's create an image of 200 px width and 40px height...
$captcha_image = imagecreate(200, 40);
// ... with red (255, 0, 0) background. This function fills the image with red color
$captcha_image_bgcolor = imagecolorallocate($captcha_image, 255, 0, 0);
}
?>

The next step is to create some random colors lighter and darker than the background, light red and dark red in this case. We will use these colors to create noise and in the section that draws a word:

<?
function captcha_show_image() {
// ...The above code goes here
/*
imagecolorallocate() returns a color identifier representing the color composed of the given RGB components.
In this case we will add a random value to G and B components to make lighter colors;
subtract a random value from R component and add a random value to G and B components to make darker colors.
You can play in Photoshop with color palette to see how the numbers are changing when you change the base color.
*/
$captcha_image_lcolor[] = imagecolorallocate($captcha_image, 255, mt_rand(230, 240), mt_rand(230, 240));
$captcha_image_lcolor[] = imagecolorallocate($captcha_image, 255, mt_rand(230, 240), mt_rand(230, 240));
$captcha_image_lcolor[] = imagecolorallocate($captcha_image, 255, mt_rand(160, 220), mt_rand(160, 220));
$captcha_image_dcolor[] = imagecolorallocate($captcha_image, 255-mt_rand(50, 100), mt_rand(0, 50), mt_rand(0, 50));
$captcha_image_dcolor[] = imagecolorallocate($captcha_image, 255-mt_rand(50, 100), mt_rand(0, 50), mt_rand(0, 50));
$captcha_image_dcolor[] = imagecolorallocate($captcha_image, 255-mt_rand(50, 100), mt_rand(0, 50), mt_rand(0, 50));
}
?>

Now let's put some noise over the background. It will be random rectangles and a grid with lines each of a random angle:

<?
function captcha_show_image() {
// ...The above code goes here
// Rectangles
for ($i = 0; $i <= 10; $i++) {
imagefilledrectangle($captcha_image, $i*20+mt_rand(4, 26), mt_rand(0, 39), $i*20-mt_rand(4, 26), mt_rand(0, 39), $captcha_image_dcolor[mt_rand(0, 2)]);
}
// Grid
for ($i = 0; $i <= 10; $i++) {
imageline($captcha_image, $i*20+mt_rand(4, 26), 0, $i*20-mt_rand(4, 26), 39, $captcha_image_lcolor[mt_rand(0, 2)]);
}
for ($i = 0; $i <= 10; $i++) {
imageline($captcha_image, $i*20+mt_rand(4, 26), 39, $i*20-mt_rand(4, 26), 0, $captcha_image_lcolor[mt_rand(0, 2)]);
}
}
?>

It will look like this:

Example of random background for CAPTCHA image. Output of PHP and GD

It's time to create the CAPTCHA “word”. We have to define the “alphabet” without similar symbols like “1”, “I” and “l”, “0” and “O”, and so on:

<?
function captcha_show_image() {
// ...The above code goes here
$symbols = array('2', '3', '4', '5', '6', '7', '8', '9', 'A', 'C', 'E', 'G', 'H', 'K', 'M', 'N', 'P', 'R', 'S', 'U', 'V', 'W', 'Z', 'Y', 'Z');
$captcha_word = '';
for ($i = 0; $i <= 4; $i++) {
$captcha_word .= $symbols[mt_rand(0, 24)];
}
}
?>

Now let's place the word on the image. Each letter will have random position, size, angle and font. You should have 3 different True Type fonts — serif, sans-serif and a “comic” one. Name them “1.ttf”, “2.ttf”, “3.ttf” and place in the folder with the script:

<?
function captcha_show_image() {
// ...The above code goes here
for($i = 0; $i <= 4; $i++) {
imagettftext($captcha_image, mt_rand(24, 28), mt_rand(-20, 20), $i*mt_rand(30, 36)+mt_rand(2,4), mt_rand(32, 36), $captcha_image_lcolor[mt_rand(0, 1)], mt_rand(1, 3).'.ttf', $captcha_word{$i});
}
}
?>

It will look like this:

Example of random background and random text on CAPTCHA image. Output of PHP and GD

Let's put some noise over the text - random lines that will make it difficult for spambots to recognize the word:

<?
function captcha_show_image() {
// ...The above code goes here
// Noise over the word
imagesetstyle($captcha_image, array($captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)]));
for ($i = 0; $i <= 5; $i++) {
imageline($captcha_image, 0, mt_rand(0, 39), 199, mt_rand(0, 39), IMG_COLOR_STYLED);
}
$captcha_image_lineys = array(mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39));
$captcha_image_lineye = array(mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39));
for ($i = 0; $i <= 9; $i++) {
imageline($captcha_image, $i*20+mt_rand(1, 6), $captcha_image_lineys[$i], $i*16+mt_rand(1, 6), $captcha_image_lineye[$i], $captcha_image_lcolor[mt_rand(0, 1)]);
imageline($captcha_image, $i*20+mt_rand(1, 6), $captcha_image_lineys[$i], $i*16+mt_rand(1, 6), $captcha_image_lineye[$i], $captcha_image_lcolor[mt_rand(0, 1)]);
}
}
?>

It will look like this:

Example of complete CAPTCHA image with random background and random text with noise over the text.  Output of PHP and GD

This piece of code stores the word in a session:

<?
function captcha_show_image() {
// ...The above code goes here
session_start();
$_SESSION['magicword'] = md5($captcha_word);
}
?>

And now we're ready to output the image to browser. Note, that you have to send some headers to prevent the image from caching:

<?
function captcha_show_image() {
// ...The above code goes here
// Output the image to browser
header('Content-type: image/png');
header('Expires: Sun, 1 Jan 2000 12:00:00 GMT');
header('Last-Modified: '.gmdate("D, d M Y H:i:s").'GMT');
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', false);
header('Pragma: no-cache');
imagepng($captcha_image);
imagedestroy($captcha_image);
}
?>

Finally we have done with the first function. This is the whole code, without comments:

<?
function captcha_show_image() {
$captcha_image = imagecreate(200, 40);
$captcha_image_bgcolor = imagecolorallocate($captcha_image, 255, 0, 0);
$captcha_image_lcolor[] = imagecolorallocate($captcha_image, 255, mt_rand(230, 240), mt_rand(230, 240));
$captcha_image_lcolor[] = imagecolorallocate($captcha_image, 255, mt_rand(230, 240), mt_rand(230, 240));
$captcha_image_lcolor[] = imagecolorallocate($captcha_image, 255, mt_rand(160, 220), mt_rand(160, 220));
$captcha_image_dcolor[] = imagecolorallocate($captcha_image, 255-mt_rand(50, 100), mt_rand(0, 50), mt_rand(0, 50));
$captcha_image_dcolor[] = imagecolorallocate($captcha_image, 255-mt_rand(50, 100), mt_rand(0, 50), mt_rand(0, 50));
$captcha_image_dcolor[] = imagecolorallocate($captcha_image, 255-mt_rand(50, 100), mt_rand(0, 50), mt_rand(0, 50));
for ($i = 0; $i <= 10; $i++) {
imagefilledrectangle($captcha_image, $i*20+mt_rand(4, 26), mt_rand(0, 39), $i*20-mt_rand(4, 26), mt_rand(0, 39), $captcha_image_dcolor[mt_rand(0, 2)]);
}
for ($i = 0; $i <= 10; $i++) {
imageline($captcha_image, $i*20+mt_rand(4, 26), 0, $i*20-mt_rand(4, 26), 39, $captcha_image_lcolor[mt_rand(0, 2)]);
}
for ($i = 0; $i <= 10; $i++) {
imageline($captcha_image, $i*20+mt_rand(4, 26), 39, $i*20-mt_rand(4, 26), 0, $captcha_image_lcolor[mt_rand(0, 2)]);
}
$symbols = array('2', '3', '4', '5', '6', '7', '8', '9', 'A', 'C', 'E', 'G', 'H', 'K', 'M', 'N', 'P', 'R', 'S', 'U', 'V', 'W', 'Z', 'Y', 'Z');
$captcha_word = '';
for ($i = 0; $i <= 4; $i++) {
$captcha_word .= $symbols[mt_rand(0, 24)];
}
for($i = 0; $i <= 4; $i++) {
imagettftext($captcha_image, mt_rand(24, 28), mt_rand(-20, 20), $i*mt_rand(30, 36)+mt_rand(2,4), mt_rand(32, 36), $captcha_image_lcolor[mt_rand(0, 1)], mt_rand(1, 3).'.ttf', $captcha_word{$i});
}
imagesetstyle($captcha_image, array($captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)], $captcha_image_dcolor[mt_rand(0, 2)]));
for ($i = 0; $i <= 5; $i++) {
imageline($captcha_image, 0, mt_rand(0, 39), 199, mt_rand(0, 39), IMG_COLOR_STYLED);
}
$captcha_image_lineys = array(mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39));
$captcha_image_lineye = array(mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39), mt_rand(0, 39));
for ($i = 0; $i <= 9; $i++) {
imageline($captcha_image, $i*20+mt_rand(1, 6), $captcha_image_lineys[$i], $i*16+mt_rand(1, 6), $captcha_image_lineye[$i], $captcha_image_lcolor[mt_rand(0, 1)]);
imageline($captcha_image, $i*20+mt_rand(1, 6), $captcha_image_lineys[$i], $i*16+mt_rand(1, 6), $captcha_image_lineye[$i], $captcha_image_lcolor[mt_rand(0, 1)]);
}
session_start();
$_SESSION['magicword'] = md5($captcha_word);
header('Content-type: image/png');
header('Expires: Sun, 1 Jan 2000 12:00:00 GMT');
header('Last-Modified: '.gmdate("D, d M Y H:i:s").'GMT');
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', false);
header('Pragma: no-cache');
imagepng($captcha_image);
imagedestroy($captcha_image);
}
?>

What about captcha_verify_word() function? Ok, I'll reveal it to you. Place this code in captcha.function.php before or after the previous function:

function captcha_verify_word() {
if (md5($_POST['magicword']) != $_SESSION['magicword'] || empty($_SESSION['magicword']) || !isset($_SESSION['magicword'])) {
unset($_SESSION['magicword']);
return false;
} else {
unset($_SESSION['magicword']);
return true;
}
}

As mentioned above, you have to be sure that session exists, that it's not empty, and you must unset the session variable after every visitor's submit. The usage of this function is simple: it returns true if a word is right, and false in wrong case.

Now it's time to open the captcha.image.php file and place there the following code:

<?
include('captcha.function.php');
captcha_show_image();
die;
?>

Yes, it's simple. But note, that as captcha_show_image() function sends some HTTP headers and outputs the image, there must be no whitespaces, tabs or characters of any kind at all before the starting and after the closing PHP tags in files captcha.function.php and captcha.image.php! Even a single whitespace! This code is ok:

<? include('captcha.function.php');
captcha_show_image();
die;
/*
THE TRAGEDY OF MACBETH
by William Shakespeare

Dramatis Personae
DUNCAN, King of Scotland
MACBETH, Thane of Glamis and Cawdor, a general in the King's army
...
*/
?>

And this code is wrong:

...
MACBETH. Thou losest labor.
As easy mayst thou the intrenchant air
With thy keen sword impress as make me bleed.
Let fall thy blade on vulnerable crests;
I bear a charmed life, which must not yield
To one of woman born.
...
<?
include('captcha.function.php');
captcha_show_image();
die;
?>

Getting things done you can finally see the image in your browser by typing the url to captcha.image.php.

Now let's make a HTML form protected against spambots with CAPTCHA image. This will be a very short form with only one field just for example, though of course the script can protect any form with any amount of fields for any purpose, will it be a forum, a blog or a simple contact us form on your website.

So open the file form.php and paste this very code:

<?
include('captcha.function.php');

session_start();

// Initialising the $error variable. At the start it is 0. Each field in this form is processed, and if something is wrong (empty input, wrong email address, invalid captcha code, etc) $error is incremented. Form will stop processing if $error is > 0.
$error = 0;
$message_text = 'Please enter your message';
$message = '';
$captcha_text = 'Please tell me you\'re not a spambot';

if (count($_POST) == 0) {
show_form();
die;
} else {
// Checking message
if (empty($_POST['message'])) {
$error .= 1;
$message_text = 'Please enter your message';
} elseif ((strlen($_POST['message']) < 10)) {
$error .= 1;
$message_text = 'Please enter no less than 10 symbols';
$message = $_POST['message'];
} else {
$error .= 0;
$message_text = 'your message';
$message = $_POST['message'];
}

// Now let's check TheCAPTCHA
if (!captcha_verify_word()) {
$error .= 1;
$captcha_text = 'Wrong image code';
} else {
$error .= 0;
}

// If $error is > 0, we'll show the form again
if ($error > 0) {
show_form();
die;
} else {
// If everything is ok, we'll do something
$message = wordwrap(strip_tags(trim($_POST['message'])), 70);
echo "Thank you for your message! Your CAPTCHA code was right. this is your message: $message";
die();
}
}
// This function shows the form
function show_form() {
?>
<html>
<head>
<title>My form</title>
</head>
<body>
<h1>Please feel free to leave your message:</h1>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
<div><?php echo $GLOBALS['message_text']; ?><br />
<textarea name="message" cols="30" rows="6"><? echo $GLOBALS['message']; ?></textarea></div>
<div><? echo $GLOBALS['captcha_text']; ?><br />
<img src="captcha.image.php?nocache=<?php echo md5(time()); ?>" border="0" /><br />
<input name="magicword" type="text" /><br />
</div><br />
<div>Please fill in all the fields and press the button below</div>
<div><button type="submit">Submit your message</button></div>
</form>
</body>
</html>
<?
}
?>

Finally everything is done!

Now you have the CAPTCHA function to protect your site against spambots! Just point your browser to form.php and see how it works. As you can see everything is quite easy.

Eugene Orlov

And who is the author?

Eugene Orlov. I'm 30, live in Moscow, Russia. Please if you have any questions. Content of this site is available under the terms of the GNU Lesser General Public License