java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

Hibernate throws HibernateQueryException: could not resolve property
3d line (Vector) drawing in Android?
How do you get the ip address of a remote EJB client in Glassfish?
Encrypting with DES and password
byte[] to file in Java
Is there something like a rolling OutputStream?
Many java processes with mbeans, how to manage jmx ports
Tomcat 6 server - was running, but now it won't start - error in log file -SEVERE: Null component?
How to display a dialog in BlackBerry
Java try catch blocks
Deploying grails application alongside non-grails liferay?
Using properties/variables in jboss-web.xml
Java create an instance of enum via reflection
How to generate random meaningless, but at the same time easy to remember words?
Reliabily unload dll in java
How to programmatically logon to a URL, keep the session, and browse around to different pages

Categories

HOME
date
image-processing
tizen
heroku
android-4.4-kitkat
google-oauth
wamp
in-app-purchase
medical
primary-key
windows-10-universal
adobe-analytics
rascal
leiningen
orchardcms
flask-wtforms
dynamics-crm-online
try-catch
lombok
reactive-cocoa
excel-vba-mac
beyondcompare
immutable.js
amazonsellercentral
paging
errorlevel
excel-2007
format-specifiers
selectedindexchanged
intel-pin
uninstall
jspm
primitive
key-value-observing
google-sites-2016
vsts-build-task
xmlreader
android-nestedscrollview
google-api-nodejs-client
serve
jquery-bootgrid
swisscomdev
dism
mapzen
google-closure
eigenvalue
flickr-api
jquery-validate
mplayer
nstextview
google-perftools
capacity
pdfclown
epson
orthogonal
eventkit
pycaffe
visual-c++-2008
url-pattern
url-masking
xpath-1.0
system.management
freedesktop.org
feedback
map-projections
service-accounts
python-green
gwidgets
winddk
flask-cors
dml
openlaszlo
uv-mapping
mysqltuner
page-layout
xsockets.net
.aspxauth
aqtime
spring-io
blending
gwt-rpc
ember-app-kit
tinn-r
funscript
google-email-migration
windows-phone-7.1.1
punbb
adk
mercurial-server
servicehost
file-comparison
locate
datareader
yslow
suppress
forums
user-friendly
web-architecture
microsoft-virtualization
yagni
putchar
grid-system

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App