system> mysql lse mysql> delete from words; mysql> delete from pages; mysql> quit;
/usr/local/apache/htdocs/, where two big directories exist big1/
and big2/.
In that case, you can issue two indexing statemets using:
LSE-index dbname /usr/local/apache/htdocs/big1 /big1 LSE-index dbname /usr/local/apache/htdocs/big2 /big2
Using this approach you could schedule runs with cron for odd and even
days, or whichever way you like.
LSE-index should crash
for some reason, then the risk exists that the last indexed file was only
partially processed. If you think that this is the case, then follow
these steps:
pages with the highest timestamp:
select max(stamp) from pages;
select id from pages where stamp = maxvalue
update pages set stamp = 0 where id = id
E.g., consider the following hypothetical situation. You have a directory
/usr/local/apache/htdocs/secure/ which is protected by an
authentication method of the webserver. The search engine should not
return hits pointing into this directory tree, because the search results
of any non-authenticated visitor would be polluted by unreachable document
links. In this case, you would remove the pages in MySQL:
delete from pages where uri like '/secure%';
Or consider the following. You want to take out 'non-words' from the index dictionary, because these words are meaningless in searches:
mysql> # Step 1: determine the word ID for 'the' mysql> select id from words where word = 'the'; +-----+ | id | +-----+ | 284 | +-----+ 1 row in set (0.34 sec) mysql> # Step 2: kill the entry in the words list mysql> delete from words where id = 284; mysql> # Step 3: kill the entries in the pages hitlist mysql> delete from hits where wordid = 284;
Note that as of version 1.02, the script LSE-index supports a flag
-n, where non-words can be specified on the commandline.
<form method="post" action="search.php"> <input type="text" size="15" name="words"><br> <input type="hidden" name="logical" value="or"> <input type="hidden" name="matchmode" value="matchexact"> <input type="submit" value="search"> </form>
Please see section 4.3 to understand why the form variables have the shown names and what their meanings are.
-m of the indexer
LSE-index. The working of the flag is quite rudimentary; LSE-index
will simply wait until the CPU consumption of all mysqld processes
drops below a given percentage. However, using this flag, you can start
really long indexing jobs and let them run.