14

strfilter

strfilter
SYNOPSIS

strfilter($s, $lang)

DESCRIPTION

strfilter returns a character string with the words in the character string $s in lowercase, without accents, without duplications and without the words which are not significant in the language $lang separated by one space.

The insignificant words are defined in the file includes/stopwords.inc:

  1. includes
    1. stopwords.inc
  1. global $stopwords;
  2.  
  3. $stopwords = array(
  1.     'en' => array(
  2.         'a',
  3.         'about',
  4.         'above',
  1.     'fr' => array(
  2.         'a',
  3.         'au',
  4.         'aussi',

stopwords.inc defines the global variable $stopwords. $stopwords holds a table which associates for each language managed by the program a list of the words which are not significant in an index.

CODE
  1. global $stopwords;
  2.  
  3. $stopwords = array();
  4.  
  5. @include 'stopwords.inc';

Loads the global variable $stopwords from the file stopwords.inc.

  1. require_once 'strflat.php';
  2.  
  3. function strfilter($s, $lang) {
  4.     global $stopwords;
  5.  
  6.     if ($s) {
  7.         $wlist=array_map('strtolower', array_map('strflat', array_unique(preg_split('/\s+/', $s, -1, PREG_SPLIT_NO_EMPTY))));
  8.  
  9.         if ($lang && array_key_exists($lang, $stopwords)) {
  10.             $wlist=array_diff($wlist, $stopwords[$lang]);
  11.         }
  12.  
  13.         return implode(' ', $wlist);
  14.     }
  15.  
  16.     return false;
  17. }
SEE ALSO

translate

Comments

Your comment:
[p] [b] [i] [u] [s] [quote] [pre] [br] [code] [url] [email] strip help 2000

Enter a maximum of 2000 characters.
Improve the presentation of your text with the following formatting tags:
[p]paragraph[/p], [b]bold[/b], [i]italics[/i], [u]underline[/u], [s]strike[/s], [quote]citation[/quote], [pre]as is[/pre], [br]line break,
[url]http://www.izend.org[/url], [url=http://www.izend.org]site[/url], [email]izend@izend.org[/email], [email=izend@izend.org]izend[/email],
[code]command[/code], [code=language]source code in c, java, php, html, javascript, xml, css, sql, bash, dos, make, etc.[/code].