java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

CachedRowSet update a record in H2
I cant count time period
Choose color to draw lines
How to run init and destroy method every time in Spring
How implement an interface Generic Method
Maven Dependency resolution exception
Storing the generated xml into a string variable in java
Algorithm to Find Top 10 Matches In Java
Using SVN Revision no in the source code
Android Volley JsonObjectRequest with multiple parameters
Can you initialize a derived class with an instance of a baseclass
JScrollPane not scrolling horizontally
Print only html table using javascript
Spring boot + OCPSoft urlRewriter + angular ui router html5mode
Android RuntimeException and NullPointerException from Login to MainActivity
How to speedup seda shutdown?

Categories

HOME
xamarin
winforms
openlayers
c#-4.0
angular-material
hashmap
elm
spring-jdbc
rubygems
fsm
umd
ezpublish
convolution
webrequest
vault
windows-azure-storage
event-handling
fallback
visual-studio-cordova
dynamics-crm-online
seaborn
finite-automata
footer
google-cloud-speech
plunker
jquery-ajaxq
dxf
android-widget
neo4j-spatial
gsmcomm
data-manipulation
grails-3.1
stringtemplate
mmenu
cookiecutter-django
avro4s
gulp-sourcemaps
nxlog
keydown
smartcontracts
termination
setter
knockout-components
sybase-asa
linode
titanium-android
time-and-attendance
phishing
multi-level
master-slave
fadeout
rhino-servicebus
tform
wapiti
natvis
jfugue
reactive-banana
simple-framework
sailfish-os
gwidgets
winddk
mdt
codeigniter-routing
p4java
undefined-reference
ember-charts
cloud-connect
farseer
access-rights
flash-builder4.5
rabl
pushbackinputstream
android-screen-support
referrer
selected
mhtml
sublist
runas
dsn
onsubmit
windows-phone-7.1.1
spring-portlet-mvc
removeclass
xmlspy
rfc1123
asp.net-mvc-areas
coda-slider
castle-monorail
meego
libs
simpletest
genealogy
data-acquisition
multiple-languages
commodore

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile