answered question
1
Vote
Vote
1
Answer
Answer
M$1.00
Is there an open source tool that can extract peoples names from text?
I'm looking for a way to get names from text where I don't know what names might exist before hand. For example a baseball article that talks about prospects or certain trainers, managers, players.
voted interesting: williamwaco
answers (1)
You will need to use a Regular expression pattern matching algorithm and a split parser. Vowel patterns, length of the name, popularity ranking, are some considerations or combinations of words may imply a name. You will also need a human quality check to make sure the word is really a name. Use a queue to let the names accumulate, create the ability for the filter to use common names from a database, length threshholds, and vowel and constant pattern matching; look at word combinations surrounding the word, before and after; allow the operator to select the name; eliminate names entering the queue that have been selected.
How does a person recognize a phrase is referencing a person and not a thing? What makes the structure of a name unique and recognizible. If I say, "John Brown", how do you know this is a name. Perhaps, certain words are reserved for names. Brown is a color but in combination with John, it becomes a name.
Some names can share meaning with things. For example, Hewlett Packard is a computer type and a persons name. International Business Machines is a company and not a person.
Interesting problem your trying to solve.
How does a person recognize a phrase is referencing a person and not a thing? What makes the structure of a name unique and recognizible. If I say, "John Brown", how do you know this is a name. Perhaps, certain words are reserved for names. Brown is a color but in combination with John, it becomes a name.
Some names can share meaning with things. For example, Hewlett Packard is a computer type and a persons name. International Business Machines is a company and not a person.
Interesting problem your trying to solve.
source(s):
Pattern matching algorithms.
Pattern matching algorithms.
Related questions
140 characters left













Look for Adjectives which describe nouns.
http://www.yourdictionary.com/grammar-rules/Examples-of-Adjectives.html
Discard the split fragment with the adjective.
Jane was acting very quirky like a mouse.
Jane | very quirkly like a mouse
Jane
SXOX (O-constant, X vowel, S-Start Constant)
What Start Constant could be D,F,G,H,J,K,L,M,N,P,R,S,T,V,X,Y,Z+
The more data you get the better your tuning and pattern matching will get.
sentence_starters = %qThe If In Not When So Next While As After Then What Where Once But
words = article.content.scan(/A-Z\.A-Z\.|a-zA-Z'\-+|.,!/) # content into words, punctuation and abbr's such as A.J.
terms = {}
(0..(words.size-1)).each do |i|
if wordsi.match(/^A-Za-z\-|^A-Z\.A-Z\./)
if wordsi-1 && wordsi+1 && wordsi+2 &&
wordsi+1.match(/^A-Za-z\-/) &&
wordsi+2.match(/^\.,\!|^a-z+|^A-Z{2,3}|^Jr$|^Sr$/) &&
wordsi-1.match(/^\.,\!|^a-z+|^A-Z{2,3}/) &&
!sentence_starters.include?(wordsi)
index = "#{wordsi} #{wordsi+1}"
index += " #{wordsi+2}" if wordsi+1 == 'Jr' || wordsi+1 == 'Sr'
termsindex ? termsindex+=1 : termsindex=1
end
end
end