java


Getting text from a website using JSoup


I’m working with JSoup to parse the html website.
I want to get the article from (for example) Wikipedia.
I would like to get the text from the main page (http://en.wikipedia.org/wiki/Main_Page) from the table “From today’s featured article”.
Here’s the code:
Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page”);
Elements el = doc.select("div.mp-tfa”);
System.out.println(el);
The problem is that it doesn’t work properly - it prints out just a blank line.
The “From today’s featured article” table is inserted in div class=“mp-tfa”.
How to get this text in my java program?
Thanks in advance.
Change:
doc.select("div.mp-tfa");
To:
doc.select("div#mp-tfa");
The better way would to iterate over the Elements thus retrieved for the tag, class or Element of your choice, simply put:
Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page").get();
Elements el = doc.select("div#mp-tfa");
for (Element e : el) {
System.out.println(e.text());
}
Would give:
The Boulonnais is a heavy draft horse breed from Fr....
I think it's supposed to be:
Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page").get();
Elements el = doc.select("div#mp-tfa");
System.out.println(el);

Related Links

java udp packet merging
two ListView not working on NavigationDrawer
Java ElasticSearch example using TransportCilent
Hot to set two Image Arrays from (Values)XML in to one common Array
Set classpath permanently in Linux
Iterate an Enumeration in Java 8
Memory leak in netty API
forward issues after file download in struts 1.3 action class
Difference in functionality between instance created by classloader and new keyword
Java mail api and hmailserver RECEIVED: 530 5.7.0 Must issue a STARTTLS command first
XLSX file getting corrupted when trying to write data into multiple sheets in JAVA
Getting jar specific path as string
Java I/O File Not Found
Maximum of Stream with custom Comparator
Compiling OpenCV on Ubuntu
Location of jetty.xml in a maven project?

Categories

HOME
netsuite
mfc
fft
alfresco
gis
graphql
electronics
jgroups
icloud
disassembler
ravendb
yum
export-to-csv
timeout
dtrace
oxyplot
cloudhub
mustache.php
kudan
firebase-crash-reporting
su
phpfox
maxmind
strncpy
jspm
srcset
r-raster
framemaker
language-concepts
picasso
jquery-bootgrid
libvpx
hilbert-curve
squib
graphenedb
unoconv
ensembles
fedex
g-code
modelmapper
sencha-touch-2.3
ionicons
slickedit
knockout-components
crypt
mako
dynamics-sl
spring-android
nxt
angular-cache
url-masking
fody
xml-attribute
actionbardrawertoggle
xpath-1.0
intel-fortran
icu4j
t4mvc
ruby-2.2
mmc
sonarqube5.1.2
c++03
tween
codeigniter-url
angularjs-ng-click
markers
internet-connection
commoncrypto
sitemesh
runtime.exec
soundtouch
dataservice
file-locking
yui-compressor
free-variable
quickdialog
bluepill
venn-diagram
enter
buildr
stage
pydot
hamachi
recent-documents
propagation
mozilla-prism
getresponsestream
firefox-5
webkit.net
iweb
javap
putchar
django-notification
ajax-forms

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile