java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

How to overwrite a file in java [closed]
Detect screen off/on and change wifi state in Android
How to get ContentType of a blob?
Trying to mock my static method and need to return an object, but it object always returning null
Limit the number of #Asynchronous requests to a specifc EJB in WebLogic?
ClosedChannelException on Socket
What is the function of the getContentPane() in javax.swing?
App Engine JPA setParameter not working correctly
how can we give hibernate session.get 2nd arg as non id value
How to assign label text to a determined variable on JavaFX
Why I'm I getting a .class expected error? [closed]
I have a basic Update Where statement that is sytaxticly correct but does not update the table
Dynamic binding of private methods: Java vs. C++
Extracting variable outside of a loop in a method
How to emulate CPUs in Java? [closed]
Java jTable is not visible inside jScrollPane

Categories

HOME
multithreading
keras
magnific-popup
image-processing
json.net
jsp-tags
programming-languages
rubygems
opengl-es-2.0
directx
google-translate
node-pdfkit
flask-wtforms
dosgi
openrefine
reactive-cocoa
jqwidget
crystal-reports-2008
visual-composer
trading
crystal-reports-2010
tdd
su
kannel
info.plist
fgetcsv
google-cloud-nl
lightswitch-2013
directx-10
phonegap
xquery-3.0
repo
rotational-matrices
overriding
revolution-slider
dosbox
restlet
automake
c11
ssjs
janrain
thin
websphere-mq-fte
wso2carbon
smartcontracts
atomicity
jquery-validate
celery-task
blogengine.net
yii2-extension
svmlight
sqlclient
crypt
ifs
mako
cubes
wdf
deadbolt-2
nxt
heidisql
azure-sdk
log4c
mikroc
storekit
gridview-sorting
multiple-regression
nsviewcontroller
pagedlist
bluegiga
ptrace
iad
xna-4.0
prerequisites
cartesian-product
metaclass
comobject
pundit
sframe
clicktag
asp.net-dynamic-data
codeigniter-routing
socketexception
aapt
openlaszlo
mysqltuner
sitemesh
bulkloader
libstdc++
trusted
broken-links
propertyeditor
reddot
dotnethighcharts
transactionscope
eclipse-templates
joyent
ecl
jmock
h.323
routedevent
rescale
phonon
pascal-fc
spec#
dbisam
caching-application-block
misv

Resources

Encrypt Message