java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

Java get X, Y, Width, Height cords from an Image in Slick2D
Creating a Class Mathematical Function
How do I undeploy all artifacts from WildFly?
JavaFX: How to know which part of an Observable has been modified?
Calling super.paintcomponent(g) in class doesn't work
Optimisation of searching HashMap with list of values
Passing a parameter into Duckling Clojure function from within Java application
Appium: Why is repeating touchAction: press&release frequently skip the input fields in a form?
How to hash SubjectPublicKeyInfo for certificate pinning using okhttp 3.x in Android
404 page when run as web application project
How to detect if two Strings in an array equal
LibGDX Table showing up too small on high DPI screens?
How to get an Array into Jlist
Output Arraylist String without brackets and commas
How to read in list objects to be serialized and deserialized using a different constructor?
Java - How Java decide where is infinity?

Categories

HOME
cakephp
wso2
magnific-popup
urbancode
fluentd
ngrx
heroku
plone
jsp-tags
amortized-analysis
frameworks
gps
fsm
google-project-tango
networkx
dax
quicklook
modelica
dynamics-crm-online
http-status-code-504
crystal-reports-2008
google-pagespeed
riot.js
facebook-instant-articles
claims-based-identity
java-7
phpfox
social-media
maquette
msys2
semantic-versioning
copying
asset-pipeline
textmate
jmonkeyengine
android-browser
hilbert-curve
sql-server-agent
eigenvalue
checkboxlist
galleria
revapi
domain-model
tropo
knockout-components
titanium-android
crosswalk-runtime
word-vba-mac
parallel-data-warehouse
setuptools
azure-sdk
django-debug-toolbar
player
fuzzy-search
simplewebrtc
lua-5.1
prettytensor
jfugue
ibaction
oberon
pick
camanjs
cdt
fluentautomation
system.net.webexception
rtmfp
picturefill
jboss-weld
didselectrowatindexpath
apc
mylyn
http-unit
distutils
referrer
typoscript2
mt
nsmanagedobject
asyncfileupload
pydot
zpt
django-tagging
calling-convention
jmock
floating
datareader
yslow
webkit.net
mtj
virtual-functions
mediarss
ubuntu-9.04
ntvdm.exe

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App