java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

Check if lists contain same objects
How to convert tamil string to english string in java [closed]
Possible to configure more repositoryFactoryBeanClass in a spring boot, or need another work around?
What happens when using push(), offer() and add() methods in ArrayDeque at the same time?
how to hide params in $http POST method
Spring 4 data-jpa JpaReposetory
detect last foreach loop iteration
Using JPA native query with zero or more parameters
windows-my certificate store equivalent linux
Difference in BigDecimal behavior
Running a java script with a java program
Jasper report export pdf file content was duplicated
Java SlotMachine Exception Error [duplicate]
How to manage springs ROLE from my reactJS client
How do we get and set same Toggle value while navigating from one screen(activity) to another activity?
Editor initialization failed site is incorrect in rcp eclipse

Categories

HOME
blogger
omnet++
drivers
relative-path
view
alfresco
framework7
survey
sql-server-2016
pivotal-cloud-foundry
android-youtube-api
nano-server
export-to-csv
correlation
jplayer
vb.net-2010
static-libraries
circuit
percona
clearcase-ucm
angular-ui
plunker
paging
zurb-foundation-6
facebook-apps
librato
code-contracts
grails-3.1
pim
dosbox
sharefile
ansible-playbook
ws-security
reportingservices-2005
midl
scorm
apple-news
slickedit
r-forge
mediaelement
google-cdn
objective-c-swift-bridge
gridpane
radtreelist
slicknav
xml-attribute
wso2cloud
comobject
arcanist
csquery
python-green
block-device
gui-test-framework
asp.net-dynamic-data
fpml
rdl
ceil
elliptic-curve
fluentautomation
hyprlinkr
pyhdf
access-rights
batterylevel
drools-planner
ocunit
pvrtc
bitsharp
trusted
cisco-jtapi
funscript
dice
parametric-equations
pendrive
netdna-api
genshi
appender
google-friend-connect
gamma
forums
nsviewanimation
bespin
multi-tier
zend-tool
photoshop-cs4
procedural-music
commodore

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App