java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

Multimap of Multimap Java - Categorize timetable
Geotools: Total area where GridCoverage has value x
Why is scan.nextLine() skips in my Java Program? [duplicate]
Convert String representation of bytes to byte[] in java
Importing xml files into java class. Is it possible in that way?
Defining Java Constants using Bit-Shift Notation
This isn't appending the JLabel to JPanel
Maven does not download dependencies which it requires for an external project
Build path entry is missing in eclipse
Java: Strange output with printstream, why toString doesn't convert?
minecraft wont run on eclipse
How to Connect My Home Layout to Ringdroid layout Android
Running Arquillian App Engine Container Test
Why calling this function recursively does not throw a NullPointerException
JAVA - Answers are not calculating upon output?
Testing GWT and GAE

Categories

HOME
client
bluetooth
pypi
single-sign-on
dictionary
react-virtualized
xmpp
spring-jdbc
rdf
webpack-2
sql-server-2016
callback
baqend
alpha
cross-browser
mvvmcross
facebook-messenger-bot
facebook-php-sdk
kentor-authservices
imacros
modx-revolution
commonmark
worldwind
django-cms
grails3
visjs
errorlevel
dcevm
zurb-foundation-6
fluentvalidation
maquette
assistant
replaceall
libssl
tasklet
ioio
asset-pipeline
mapdb
bytecode-manipulation
xcode-extension
swisscomdev
c11
sqlite2
ansible-playbook
opshub
bootstrapper
isbn
eclipse-gef
slick-3.0
websphere-mq-fte
cubic-spline
logfiles
user-accounts
network-flow
apache-fop
multipeer-connectivity
clean-architecture
togetherjs
svmlight
jxcore
nodebb
mcafee
titanium-android
redundancy
grails-tomcat-plugin
autorest
storekit
markojs
react-native-listview
sdf
dataview
design-by-contract
two-factor-authentication
unity-networking
django-unittest
computer-algebra-systems
angular-leaflet-directive
0xdbe
responsive-images
sankey-diagram
createprocessasuser
unity3d-gui
dundas
operator-precedence
jscript.net
knuth
file-copying
neolane
flask-cors
has-many-through
starcluster
braille
ivyde
.aspxauth
runtime.exec
dataservice
excel-2003
back-stack
random-seed
plasma
http-unit
tinn-r
objective-c-2.0
chrono
coredump
xmlspy
javax.script
winbugs14
inotifycollectionchanged
xfbml
blitz++
paster
substrings
zend-tool
premature-optimization
gacutil
dbisam
ugc
ntvdm.exe

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html