answered question

answers (1)

davepamn
0
Votes
BEST ANSWER  decided by votes   |  davepamn  |  March 27, 2009 04:40 PM
You will need to use a Regular expression pattern matching algorithm and a split parser. Vowel patterns, length of the name, popularity ranking, are some considerations or combinations of words may imply a name. You will also need a human quality check to make sure the word is really a name. Use a queue to let the names accumulate, create the ability for the filter to use common names from a database, length threshholds, and vowel and constant pattern matching; look at word combinations surrounding the word, before and after; allow the operator to select the name; eliminate names entering the queue that have been selected.

How does a person recognize a phrase is referencing a person and not a thing? What makes the structure of a name unique and recognizible. If I say, "John Brown", how do you know this is a name. Perhaps, certain words are reserved for names. Brown is a color but in combination with John, it becomes a name.

Some names can share meaning with things. For example, Hewlett Packard is a computer type and a persons name. International Business Machines is a company and not a person.

Interesting problem your trying to solve.
source(s):
Pattern matching algorithms.

Voted as best: rosshann, cjd, williamwaco
Comment
davepamn
davepamn  |  March 27, 2009 04:57 PM
You can use parts of speech structure analysis also to identify nouns positions in the sentence and direct objects. Jim threw the ball to Jane. Jim is the noun, threw is the verb, and Jane is the direct object. You could split the sentence by verb into two parts, then analyze each of the parts for nouns, then check, if the noun could be a name combination.

Look for Adjectives which describe nouns.
http://www.yourdictionary.com/grammar-rules/Examples-of-Adjectives.html

Discard the split fragment with the adjective.

Jane was acting very quirky like a mouse.

Jane | very quirkly like a mouse

Jane

SXOX (O-constant, X vowel, S-Start Constant)

What Start Constant could be D,F,G,H,J,K,L,M,N,P,R,S,T,V,X,Y,Z+

The more data you get the better your tuning and pattern matching will get.
davepamn
davepamn  |  March 28, 2009 03:05 PM
Nouns that are rare may be good candidates for last names. A rarity rating can be used to evaluate the potential for a last name in combination with common first names stored in the database.
dallasrpi
dallasrpi  |  March 30, 2009 01:05 AM
You have some good ideas. All a bit more complicated than I wanted to dive into. Here is something quick I threw together for the baseball articles I'm working on. The tricky part is when the 2nd word in a sentence is in caps. This also matches places and other things, but it does what I need. Could probably be tweaked a bit more. (edit, fail on formatting and its stripping brackets)

sentence_starters = %qThe If In Not When So Next While As After Then What Where Once But
words = article.content.scan(/A-Z\.A-Z\.|a-zA-Z'\-+|.,!/) # content into words, punctuation and abbr's such as A.J.
terms = {}
(0..(words.size-1)).each do |i|
if wordsi.match(/^A-Za-z\-|^A-Z\.A-Z\./)
if wordsi-1 && wordsi+1 && wordsi+2 &&
wordsi+1.match(/^A-Za-z\-/) &&
wordsi+2.match(/^\.,\!|^a-z+|^A-Z{2,3}|^Jr$|^Sr$/) &&
wordsi-1.match(/^\.,\!|^a-z+|^A-Z{2,3}/) &&
!sentence_starters.include?(wordsi)
index = "#{wordsi} #{wordsi+1}"
index += " #{wordsi+2}" if wordsi+1 == 'Jr' || wordsi+1 == 'Sr'
termsindex ? termsindex+=1 : termsindex=1
end
end
end
140

ask any question

Top of Page
Buy Mahalo Dollars
WITH CREDIT CARD OR PAYPAL

Please log in to use this function.