Term Extraction 1.0 — from FiveFilters.org

Type or paste text or a website address
Options
Limit: 100

Quick start

You can use the form above to quickly get terms, and see the kind of output you can expect.

To use this tool from within your own application, have a look at our examples page.

URL Construction

To extract terms from a given text, use the following request URL:

  • /extract.php

Request Parameters

The following parameters can be used:

Parameter Value Description
text string The text to extract terms from (UTF-8 encoded). English is the only supported language.
output json, xml, txt, php, html The format to return the terms.
terms_only 1 or 0 (default) Set this to 1 if you're only interested in the terms (not the occurrence and term word count). Only applies to JSON output.
max number (default 50) The maximum number of terms to return.
lowercase 1 or 0 (default) Set this to 1 to have all extracted terms converted to lowercase
callback string For JSONP: name of your Javascript function to receive the JSON response. If JSON has not been requested, this has no effect The following characters are allowed: A-Z a-z 0-9 . [] and _.
url string This can be used instead of 'text' or 'text_or_url', to point to a web article.
text_or_url string For convenience, this parameter can be used instead of the 'text' or 'url' parameters to accept either a URL (on its own) or some text.
key string Access key. If you've set one up in custom_config.php, otherwise not required.
yahoo 1 or 0 (default) Set this to 1 to enable Yahoo mode (output format matching that used by Yahoo's Term Extraction service). Alternatively, you can simply call yahoo.php instead of extract.php to enable Yahoo mode.

Filtering

These parameters can be used to filter the results.

Parameter Value Description
min_occurrence number (default 1) The minimum number of times a single-word (unigram) term must appear for it be included in the output.
max_strength number (default 3) Strength is the number of words in the term, so to reduce results to terms with a maximum of 2 words, set this to 2.
keep_if_strength number (default 2) Keep a term if the term's word count is equal to or greater than this, regardless of occurrence.
exc[] string Check terms for this string, and exclude term if there's a match or partial match. This can appear multiple times.
filter 1 (default) or 0 Set this to 0 to disable filtering (overriding the four parameters above).

Yahoo compatibility

These additional parameters can be used instead of the 'key' and 'text' parameters above. They are here for compatibility with Yahoo's Term Extraction service.

Parameter Value Description
appid string Same as 'key'
context string Same as 'text'

Required parameters

Either text, url, or text_or_url must be supplied.

Configure

In addition to the options above, Term Extraction comes with a configuration file which allows you to control how the application works.

To change the configuration, save a copy of config.php as custom_config.php and make any changes you like to it.

Customise this page

If everything works fine, feel free to modify this page by following the steps below:

  1. Save a copy of index.php as custom_index.php
  2. Edit custom_index.php

Next time you load this page, it will automatically load custom_index.php instead.

Support

Check our help centre if you need help. You can also email us at help@fivefilters.org.

Thank you!

Thanks for downloading and setting up the Term Extraction web service. This software is developed and maintained by FiveFilters.org.

About

Term Extraction from FiveFilters.org is a free software project to help you perform term extraction through a web service. Given some text it will return a list of terms with (hopefully) the most relevant first. Terms can be returned in a number of formats. The application is intended to be a simple, free alternative to Yahoo's Term Extraction service. English is the only language supported at the moment.

Free Software

Note: 'Free' as in 'users have the freedom to run, copy, distribute, study, change and improve the software' (see the free software definition)

If you're the owner of this site and you plan to offer this service to others through your hosted copy, please keep a download link so users can grab a copy of the code if they want it (you can either offer a free download yourself, or link to the purchase option on fivefilters.org to support us).

For full details, please refer to the license.

If you're not the owner of this site (ie. you're not hosting this yourself), you do not have to rely on an external service if you don't want to. You can download your own copy of Term Extraction under the AGPL license.

Software Components

Term Extraction is written in PHP and relies on the following primary components:

Depending on your use, these secondary components may also be used:

System Requirements

PHP 5.2 or above is required. A simple shared web hosting account should work fine.

Download

Download from FiveFilters.org

AGPL logo

Term Extraction is licensed under the AGPL version 3 — more information about why we use this license can be found on FiveFilters.org

The software components in this application are licensed as follows...