close

http://en.wikipedia.org/wiki/Approximate_string_matching





n computing, approximate string matching is the technique of finding approximate matches to a pattern in a string.


The closeness of a match is measured in terms of the number of
primitive operations necessary to convert the string into an exact
match. This number is called the edit distance — also called the Levenshtein distance — between the string and the pattern. The usual primitive operations are:[1]


  • insertion (e.g., changing cot to coat),

  • deletion (e.g. changing coat to cot), and

  • substitution (e.g. changing coat to cost).

Some approximate matchers also treat transposition, in which the positions of two letters in the string are swapped, to be a primitive operation. Changing cost to cots is an example of a transposition.[1]

Different approximate matchers impose different constraints. Some
matchers use a single global unweighted cost, that is, the total number
of primitive operations necessary to convert the match to the pattern.
For example, if the pattern is coil, foil differs by one substitution, coils by one insertion, oil by one deletion, and foal by two substitutions. If all operations count as a single unit of cost and the limit is set to one, foil, coils, and oilfoal will not.
will count as matches while

Other matchers specify the number of operations of each type
separately, while still others set a total cost but allow different
weights to be assigned to different operations. Some matchers allow
separate assignments of limits and weights to individual groups in the
pattern.

Most approximate matchers used for text processing are regular expression matchers.[citation needed]
The distance between a candidate and the pattern is therefore computed
as the minimum distance between the candidate and a fixed string
matching the regular expression. Thus, if the pattern is co.l, using the POSIX notation in which a dot matches any single character, both coal and coil are exact matches, while soil differs by one substitution.


arrow
arrow
    全站熱搜
    創作者介紹
    創作者 UbuntuLinux 的頭像
    UbuntuLinux

    UbuntuLinux

    UbuntuLinux 發表在 痞客邦 留言(0) 人氣()