java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

jsp missing a Jar, unable to run from Tomcat but Eclipse works
How to prompt user to input a number to get size of array. Then sort array using bubble sort method
Sending messages from C server to Java client
Java HTTP post assistance
SonarQube Local Script in IntelliJ can't find mvn (IOException/No such directory)
Validating phone number in Java using twilio
Snackbar action getView()
reading from an array and creating an workbook that is .xls in 2003 format through POI
org.hibernate.MappingException: No Dialect mapping for JDBC type: 1111 on a non-managed entity
SQLite data not stored in Android database
Intelij: Cannot import classes from other modules in my project?
Selecting Radio button on Selenium Webdriver (Java)
Create a Barcode Overlay on a PDF with an instransparent Background (iText)
retrieve object through associate table by Hibernate?
Guava CacheBuilder not working as expected
Rename WSDL elements with identical names using a JAX-WS binding customization

Categories

HOME
blogger
google-api-php-client
nullpointerexception
fme
objectgears
adb
spring-jdbc
paw-app
graphql
disassembler
blueprintjs
ravendb
wamp
fancybox
datastax-java-driver
multiple-records
handsontable
google-apps-marketplace
netflix
lombok
workload-scheduler
highlight.js
saxon
serilog
beyondcompare
dbext
kryo
emgucv
tdd
atlassian-crucible
react-chartjs
.net-4.0
data-manipulation
noraui
avcapturesession
windows-dev-center
asset-pipeline
openoffice.org
atl
serve
streamsets
vapor
elasticsearch-plugin
lumberjack
angular2-meteor
smartcontracts
appcompat
winscp-net
dropbox-php
gabor-filter
recursive-datastructures
boost-preprocessor
akka-cluster
sqlbulkcopy
nbconvert
eventkit
medium.com
android-fonts
underscore.js-templating
paypal-express
gridview-sorting
essence
fadeout
pagerank
netmq
nsight
apache-commons-net
javax.mail
dia
website-monitoring
cctv
meteor-velocity
geos
.aspxauth
first-responder
viewswitcher
objective-c-2.0
qt-jambi
gcj
heartbeat
f#-powerpack
cinema-4d
appender
w3wp.exe
gamma
modelstate
sustainable-pace

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile