java


Spark multiple sessions vs shared global session


Question
What is the motivation of creating multiple Spark applications/sessions instead of sharing a global session?
Explanation
You have Spark Standalone cluster manager.
Cluster:
5 machines
2 cores (executors) each = totally 10 executors
16 GB RAM each machine
Jobs:
Dump database, requires all (10) executors, but only 1 GB RAM on each executor.
Handle dump results, requires 5 executors with 8-16 GB RAM each.
Quick data retrieval task, 5 executors with 1 GB RAM each.
etc
Which solution is a best practice? Why I should ever prefer 1st solution over 2nd, or 2nd over 1st if the resource of the cluster remains the same?
Solutions:
Launch 1st, 2nd and 3rd jobs from different Spark applications (JVMs).
Use single global Spark application/session, which holds all resources of the cluster (10 executors, each 8 GB RAM). Create fair scheduler pool for 1st, 2nd and 3rd jobs.
Use some hacks like this to run jobs with different configs from single JVM. But I'm afraid that's not very stable (officially supported by Spark team if you want) solution.
[Spark Job Server][5, but as I understand it's an implementations of the first solution
Update
Looks like 2nd option (global session with all resources + fair thread pool) isn't possible due to the fact you can configure only number of cores at pool.xml (minShare), but can't memory per executor.

Related Links

how inject a EntityManager between ear
How to generate a random String in Java [duplicate]
Retrieve time portion of date [closed]
Do Java programs ever crash?
Migrate from Tomcat to WAS
How do I access a JavaFX 1.3 static class member from Java?
How to program game of Go, Baduk, Weiqi in java
implementation of interp1 function of MATLAB in J2ME
Is it allowed to load Swing classes in non-EDT thread?
how to add ctrl - click listeners in eclipse java editor
Trying to sentinel loop this program [closed]
No endpoint mapping found for…, using SpringWS, JaxB Marshaller
Parsing a string to date gives 01/01/0001 00:00:00
Eclipse building - exclude from jar packaging but include into final product
Weblogc BEA-000449 : Closing socket as no data read from it during the configured idle timeout of 5 secs
Maven: trigger custom command when Build is finished, dependent on outcome (successful/failed)

Categories

HOME
ibm-bluemix
cloud
isabelle
heroku
android-4.4-kitkat
google-docs
jira
analysis
acquia
bs4
cross-validation
timeout
vb.net-2010
commonmark
phaser
percona
seaborn
excel-vba-mac
beyondcompare
google-cloud-speech
nodatime
zapier
linkerd
wtx
libuv
r-raster
key-value-observing
preg-match-all
vsts-build-task
abstract-class
opshub
preconditions
fakeiteasy
bootstrap-dialog
netcdf4
btrace
executenonquery
apple-news
errordocument
jquery-validate
worksheet
mako
dwscript
django-scheduler
medium.com
url-masking
thrust
paypal-express
superstack
yaws
nessus
php-parse-error
natvis
rtbkit
execute
citrus-pay
kcachegrind
twirl
xc16
knuth
file-copying
asp.net-web-api-odata
hyprlinkr
meteor-velocity
monomac
eventual-consistency
mylyn
distutils
typoscript2
dice
propertyeditor
pysimplesoap
wse3.0
dotnethighcharts
jquery-ui-layout
netdna-api
gnu-prolog
gnustep
netbeans-7.1
locate
digest-authentication
coda-slider
dentrix
data-driven
virtual-functions

Resources

Encrypt Message