java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

SeekBar is not incremented properly
Which account is used when executing xp_cmdshell 'wmic … “java -jar …”'
Android Annotations - Event binding on item created in code
get result with “evaluation” from join statement in java
Assistance with creating grid/ball (objects?) in Java
Post-Increment in Recursive Method? [duplicate]
Spring MVC #RequestParam a list of objects and a list of MultipartFile doesn't work with AJAX
Puzzlement about DirectoryReader.close()
Changing the visibility of a method
Google Sign in for Android - Release vs Debug
How can I make this program multi windowed?
Unable to use TreeSet's contains() method [duplicate]
How to convert Byte Array to image using itextPdf and display in pdf
Method reference and boolean
Spark Java Dataframe API is using Cartesian Join even thought argument is left_outer
What is the right thing to do if a socketChannel.close() got IOException?

Categories

HOME
wso2-am
deployment
mediawiki
platform-builder
mean-stack
react-redux
jpeg
fingerprint
onelogin
awesome-wm
windows-10-universal
tomcat6
modx-revolution
timeout
windows-phone-7
static-libraries
windows-server-2012
workload-scheduler
libtiff
mmap
android-widget
apache-commons-io
tooltipster
libssl
objectanimator
quote
nat
microsoft-chart-controls
http-get
typed.js
usbserial
asset-pipeline
html5-fullscreen
serve
scaffold
hot-module-replacement
code-search-engine
optix
g-code
eigenvalue
estimote
qcombobox
wso2carbon
jquery-validate
pnotify
pebble-watch
mplayer
android-fonts
passport-google-oauth
rotativa
xml-attribute
simplewebrtc
fadeout
captivenetwork
lemon
iis-arr
tableau-online
sysinternals
remobjects
inmobi
lustre
system.reflection
python-green
winddk
client-side-templating
java-metro-framework
html-helper
mesa
dml
dealloc
message-driven-bean
sitemesh
ftps
gwt-rpc
html4
delphi-6
multipage
objective-c-2.0
ocx
regsvr32
qtkit
postgresql-performance
removeclass
work-stealing
gwt-ext
nsobject
genshi
appender
calling-convention
remember-me
web-application-design
uiq3

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile