java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

Java RestEasy: Base64 encoded sign over REST
How to create a fat jar?
How to return Column Titles along with the Data from the table using Hibernate
Android Studio crash application [closed]
Java | static vars in parent and child class | accessing the child var value from parent class
JSCH Library: Getting strange character while reading readLine() [duplicate]
Converting Strings / ints into an Object Array
Jenkins groovy - Traversing the failed object tree
Java 8 stream get() on Optional - inside or outside the method?
Java JDialog - Return of an Integer
Opencv - mask image results in black image
How to create a new folder in amazon S3 bucket?
Using findAll PagingAndSortingRepository with filter
Exception java.net.SocketTimeoutException: Read timed out
Control the order and logic of how spring security runs the list of AuthenticationProviders
Spring MVC: customise JSON response

Categories

HOME
gitlab
image-processing
mockito
mfc
json.net
include
dot
react-router
android-4.4-kitkat
xmpp
rsyslog
infragistics
packages
baqend
blueprintjs
enterprise-library-5
indesign
django-imagekit
in-app-purchase
autotools
facebook-messenger-bot
gitpitch
gorm
maude-system
gnupg
vifm
vault
spring-xd
collectd
oracle-coherence
crystal-reports-2008
realex-payments-api
microsoft-sync-framework
internet-explorer-9
entitlements
fopen
nouislider
functor
semantic-versioning
environment-modules
simplexml
disassembling
jquery-bootgrid
scaffold
vapor
xenforo
reportbuilder
password-encryption
menuitem
brightcove
pg-dump
pycaffe
measures
react-native-listview
system.management
cyclomatic-complexity
wordml
content-length
graph-drawing
sailfish-os
cisco-ios
pretty-print
apache-commons-net
sorl-thumbnail
nstableviewcell
coldbox
castle
braille
picturefill
android-2.2-froyo
html5-notifications
c18
.aspxauth
access-rights
listings
odata4j
dataservice
back-stack
sabredav
anonymous-methods
trusted
quickdialog
bluepill
broken-links
hosts-file
chrono
uiviewanimation-curve
netbeans-6.9
ohm
cufon
netdna-api
mercurial-server
xmemcached
celltable
imac
web-application-design
forums
ncqrs
thunderbird-lightning
fixed-width
noscript
zune

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App