java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?

You should add the entitymentions annotator to your list of annotators.


Related Links

Creating checkerboard pattern
Call a Fragment in Asyncktask OnPostExecute method
Unprocessed time in java game loop
Multiple consecutive text Inputs Libgdx
What does “Unzip the code tree and add the root directory of that code tree to your computer’s CLASSPATH environment variable.” mean? (Java Mac OSx)
weblogic 12cR2 annoying log when using EJB and JAXWS at the same bean
Why does my JMS client not consume messages from the Topic?
Android SQLite java.lang.IllegalArgumentException: column '_id' does not exist
Select in JSTL/JSP returns null to servlet
Java Client Server - Exception Handling (Thread)
Map SQL Query to Data Object - J2EE spec
Java 8 streams conditional processing
Spring Boot logs static loaded data
Java Application with H2 Database
Cannot add local jar files in maven project
Import SWT as a Gradle dependency

Categories

HOME
cloud
raspberry-pi
bpmn
routes
programming-languages
sharepoint-designer
electronics
angular-ui-bootstrap
enterprise-library-5
datastax-java-driver
swagger-ui
spring-kafka
jsprit
django-simple-history
propel
undefined
evopdf
jprofiler
custom-wordpress-pages
amazonsellercentral
visjs
icloud-api
firebase-crash-reporting
apache-commons-io
kendo-datasource
unboundid
skia
stormpath
iframe-resizer
caret
textmate
subset-sum
rotational-matrices
host
appfabric
vxworks
swisscomdev
libraries
streamsets
code-search-engine
unspecified
unobtrusive-validation
azure-ml
trim
gzipstream
ncalc
bootstrap-dialog
flashair
wso2carbon
azure-application-gateway
hibernate-tools
revapi
dds
python-c-api
clean-architecture
aurelia-validation
nativeapplication
csound
nsarray
dtexec
spring-mongodb
titanium-android
eventkit
nested-sets
color-picker
multi-level
plottable.js
ubuntu-10.04
actionbardrawertoggle
dataview
goose
cloudbees
spim
reactive-banana
lib.web.mvc
lemon
google-places
iis-arr
rgeo
php-ci
graph-drawing
mmc
composite
p2
coldbox
network-interface
socketexception
java-metro-framework
terminfo
fluentautomation
wp-query
picturefill
ember-charts
spring-io
file-locking
ora-00911
cos
rabl
spring-validator
eventlistener
chuck
gil
venn-diagram
propertyeditor
qt-jambi
zpt
recent-documents
telerik-scheduler
nhibernate.search
database-management
xetex
premature-optimization
ubuntu-9.04
ugc
misv





Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm