java


Why does Stanford CoreNLP server split named entities into single tokens?


I'm using this command to post the data (a bit of copy pasta from the stanford site):
wget --post-data 'Barack Obama was President of the United States of America in 2016' 'localhost:9000/?properties={"annotators": "ner", "outputFormat": "json"}' -O out.json
The response looks like this:
{
"sentences": [{
"index": 0,
"tokens": [{
"index": 1,
"word": "Barack",
"originalText": "Barack",
"lemma": "Barack",
"characterOffsetBegin": 0,
"characterOffsetEnd": 6,
"pos": "NNP",
"ner": "PERSON",
"before": "",
"after": " "
}, {
"index": 2,
"word": "Obama",
"originalText": "Obama",
"lemma": "Obama",
"characterOffsetBegin": 7,
"characterOffsetEnd": 12,
"pos": "NNP",
"ner": "PERSON",
"before": " ",
"after": " "
}, {
"index": 3,
"word": "was",
"originalText": "was",
"lemma": "be",
"characterOffsetBegin": 13,
"characterOffsetEnd": 16,
"pos": "VBD",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 4,
"word": "President",
"originalText": "President",
"lemma": "President",
"characterOffsetBegin": 17,
"characterOffsetEnd": 26,
"pos": "NNP",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 5,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 27,
"characterOffsetEnd": 29,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 6,
"word": "the",
"originalText": "the",
"lemma": "the",
"characterOffsetBegin": 30,
"characterOffsetEnd": 33,
"pos": "DT",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 7,
"word": "United",
"originalText": "United",
"lemma": "United",
"characterOffsetBegin": 34,
"characterOffsetEnd": 40,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 8,
"word": "States",
"originalText": "States",
"lemma": "States",
"characterOffsetBegin": 41,
"characterOffsetEnd": 47,
"pos": "NNPS",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 9,
"word": "of",
"originalText": "of",
"lemma": "of",
"characterOffsetBegin": 48,
"characterOffsetEnd": 50,
"pos": "IN",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 10,
"word": "America",
"originalText": "America",
"lemma": "America",
"characterOffsetBegin": 51,
"characterOffsetEnd": 58,
"pos": "NNP",
"ner": "LOCATION",
"before": " ",
"after": " "
}, {
"index": 11,
"word": "in",
"originalText": "in",
"lemma": "in",
"characterOffsetBegin": 59,
"characterOffsetEnd": 61,
"pos": "IN",
"ner": "O",
"before": " ",
"after": " "
}, {
"index": 12,
"word": "2016",
"originalText": "2016",
"lemma": "2016",
"characterOffsetBegin": 62,
"characterOffsetEnd": 66,
"pos": "CD",
"ner": "DATE",
"normalizedNER": "2016",
"before": " ",
"after": "",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "2016"
}
}]
}]
}
Am I doing something wrong? I have Java client code that would at least recognize Barack Obama and United States of America as full NERs, but using the service it seems to treat each token separately. Any ideas why?
You should add the entitymentions annotator to your list of annotators.

Related Links

How to use int in 2 methods
synchronize Swing Timer with gif Image
ConversionException when using Spring with Powermock [duplicate]
Adding Response Header using AspectJ spring
Java Numbers not showing up in Text Field, I want to be read in as a String
Call a method in the custom view from MainActivity.Java
Writing thread safe singleton class [duplicate]
Remove the glowing border from focused tab with CSS
javax.crypto.BadPaddingException: Not PKCS#1 block type 2 or Zero padding
Spring + Hibernate auditing (no Spring Data)
Java kill 3 thread dump output redirect
Error in BCD multiplication
How to instantiate a generic class that extends Number with a given double value?
what is the best way to communicate over sockets with java - Architecture [closed]
Java - Issue with input.next() and input.nextInt();
Resolving multiple values returned from a query?

Categories

HOME
winforms
keras
include
electron
getelementsbytagname
spring-jdbc
gps
icloud
disassembler
networkx
uber-api
ios-charts
communication
metatrader4
imacros
u-sql
spring-tool-suite
hapi
contextmenu
predix
zapier
abi
jquery-ajaxq
realex-payments-api
entitlements
linkerd
social-media
c++-amp
librato
gsoap
directx-10
bcd
ecto
siesta-swift
http-get
usbserial
azure-sql-database
overriding
bytecode-manipulation
serve
mixture-model
event-driven
starteam
vapor
webix-treetable
document.write
hp-ux
janrain
turbogears
s
komodoedit
powercli
sqldf
jspdf-autotable
orthogonal
adxstudio-portals
skobbler-maps
impresspages
prettytensor
hill-climbing
superstack
whois
bgp
dlna
msys
icu4j
interrupted-exception
method-parameters
iiviewdeckcontroller
apache-commons-net
markers
castle
ceil
reactfx
undefined-reference
bundles
transcoding
pyhdf
geos
jsctypes
gridfs
inbox
ruby-datamapper
buster.js
gridcontrol
django-nonrel
mqx
flashvars
android-hardware
distutils
chuck
getmessage
uiviewanimation-curve
reddot
dotnethighcharts
data-loss
locationlistener
calling-convention
sql-server-profiler
site.master
subviews
gamequery
gamma
filtered-index
mirah
bespin
web-architecture
audio-capture
avatar
temporal-database
data-acquisition

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App