java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

Stop Ehcache store null values
onClick() Method can not find Exception
How to update textView properly?
How to obtain an result from comparing an input word with elements from an String array Java
Inject bean dependency into Spring application context
JavaFX New Scene on Button Click
Java - prevent code modification techniques
bson to json with mongo java driver
How to expire cart after a limited time
How to solve java.lang.NoClassDefFoundError exception in Android Studio?
md5-hashes computer security
Adding values from HashSet to 2D Matrix
java , each Json documents strored in a List of String for elastic search
Can't execute python script through java while using websocket
How is the exact flow of this program with respect to the allocation of Memory?
Having trouble solving cubic equations in Java

Categories

HOME
google-chrome-extension
cloud
openstack
path-finding
adb
amazon-ecs
rdf
bs4
append
offline
postgres-xl
usergrid
remote-access
try-catch
netflix
fatal-error
footer
connection-string
tokenize
wkwebview
mmap
rundeck
java-7
sqlcipher
sparse-matrix
django-storage
primitive
nouislider
frame
directx-10
bosh
create-table
jspresso
column-family
grails-3.1
azure-sql-database
mmenu
starteam
android-fingerprint-api
graphenedb
pdf-reactor
spring-security-kerberos
unixodbc
reactive-cocoa-5
sas-jmp
qsslsocket
jquery-nestable
logparser
hendrix
clean-architecture
lowpass-filter
tactic
media-player
django-scheduler
account-kit
persist
prettytensor
nessus
wapiti
phpcas
ora-00900
iis-arr
varargs
browser-bugs
0xdbe
method-parameters
unity3d-gui
system32
remobjects
javax.mail
npapi
knuth
android-imagebutton
titanium-modules
googlemock
android-radiobutton
navigationservice
yorick
marmalade
bundles
typo3-neos
intentservice
excel-2003
yui-compressor
sqlperformance
ocunit
cufon
hirefire
infobox
self-extracting
sudzc
rescale
user-friendly
iweb
mtj
privilege
ubuntu-9.04
grid-system
uiq3

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App