java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

Incompatible type int[] cannot be converted to int error
Rectangle with Double Values vs Int Values
Is there a runtime performance impact when using Dependency Injection?
Can't connect to Jetty 9 server via SSL with Firefox 50
Fetch Role name or log in the user
Why does Java 8's Comparator.comparing() cast the return value to Serializable?
Cant understand IntFunction behaviour
Get all the conditions which resulted in rule execution
Not able to generate log file name with correct date format
Deep clone with copy constructor and object.clone() [duplicate]
Testing Akka Persistent FSM Actor with the applying method
com.arangodb.ArangoDBException: Response: 400, Error: 1923 - malformed edge definition
How to handle BigDecimal and Date fields in Lucene 6.0
How to pass List of hash map as query param in jersey
where to store file in android project
Is there a side effect in the following Java code? [duplicate]

Categories

HOME
deezer
heroku
comparison
appx
include
objectgears
cplex
spagobi
q
sqlite3
jpeg
umd
django-imagekit
multiple-records
synchronization
google-translate
ssl-client-authentication
spring-xd
visual-studio-cordova
ups
circuit
clearcase-ucm
footer
solaris-10
evopdf
lucene.net
xlsxwriter
tokenize
emgucv
tdd
poltergeist
filezilla
maxmind
publish
jspm
libuv
grails-3.1
lto
hockeyapp
neuroscience
trim
squib
mesos-chronos
eclipse-scout
janrain
nand2tetris
fakeiteasy
wso2carbon
user-accounts
youcompleteme
multipeer-connectivity
dropbox-php
aurelia-validation
nativeapplication
linode
spring-cache
pycaffe
ableton-live
transmitfile
qgraphicsview
ios4
system.management
qtableview
metaclass
iis-arr
pundit
emailrelay
deis
arcanist
fluid-dynamics
nsight
sniffer
e10s
kineticjs
pick
codeigniter-url
gadt
internet-connection
neolane
rdl
htmlcleaner
ccss
google-cloud-save
bulkloader
sqlperformance
html-editor
anonymous-methods
dmoz
uiviewanimation-curve
bigcouch
cufon
netdna-api
mercurial-server
gcj
psi
django-tagging
celltable
sql-server-profiler
telerik-scheduler
visitor-statistic
subviews
paster
gallio
dentrix
filtered-index
substrings
pascal-fc

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App