java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

how to pass a variable using request.setAttribute and send to a another page
How to configure a Yarn application that interacts with HDFS and Hbase in a secure mode?
Sort arrays by elements
Spring Boot JPA Hibernate JVM heap is not released
Spring service method and a complex validation logic/rules
I need a better way to reduce the line of codes which has a lot of if statements with method calls in conditions
How to apply AspectJ in GWT application for logging?
Appending to a Textfile while keeping the stored data in the TextFile
Android SOAP response error '400 Bad Request'
The second query is efficient and does not require a table join
Show formatted number and save double value
Android ImageView shows as the correct size for an image, but doesn't display the image
Using JGit to create a tool that clones repositories out of properties
In Spring data elasticsearch is there any clear syntax to query which mapping value is HashMap
Need to drag imageview to its boundaries
How to do Post for https using Apache client

Categories

HOME
twitter
vim
pypi
webpack
smarty
at-command
yahoo-oauth
microservices
retrofit
django-imagekit
directx
awesome-wm
session-variables
volttron
offline
handsontable
phaser
static-libraries
orchardcms
carthage
percona
netflix
progressive-web-apps
openrefine
karma-jasmine
lldb
abi
functional-testing
captiveportal
javacv
entitlements
r-raster
media-queries
semantic-versioning
webtest
rotational-matrices
revolution-slider
file-format
space-complexity
mixture-model
opshub
mapzen
ncalc
core-plot
glew
midl
checkboxlist
websphere-mq-fte
apple-news
outlook-2013
chain-builder
zip4j
multipeer-connectivity
ionicons
python-cryptography
gcsfuse
mcafee
jspdf-autotable
chord-diagram
color-picker
android-fonts
react-native-listview
natvis
ios8-today-widget
collapse
intellij-14
retina
fouc
quicklisp
computer-algebra-systems
muse
mmc
graph-api-explorer
preferences
typeof
p4java
jubula
mysql-error-1062
concurrent-collections
mcts
cdc
batterylevel
chronoforms
pygit2
html-editor
referrer
gil
runas
dbproviderfactories
chrono
external-accessory
invite
self-extracting
table-footer
ecl
inotifycollectionchanged
google-friend-connect
web-application-design
fixed-width
brewmp
simpletest
windows-live-messenger

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App