java
Spark multiple sessions vs shared global session
Question What is the motivation of creating multiple Spark applications/sessions instead of sharing a global session? Explanation You have Spark Standalone cluster manager. Cluster: 5 machines 2 cores (executors) each = totally 10 executors 16 GB RAM each machine Jobs: Dump database, requires all (10) executors, but only 1 GB RAM on each executor. Handle dump results, requires 5 executors with 8-16 GB RAM each. Quick data retrieval task, 5 executors with 1 GB RAM each. etc Which solution is a best practice? Why I should ever prefer 1st solution over 2nd, or 2nd over 1st if the resource of the cluster remains the same? Solutions: Launch 1st, 2nd and 3rd jobs from different Spark applications (JVMs). Use single global Spark application/session, which holds all resources of the cluster (10 executors, each 8 GB RAM). Create fair scheduler pool for 1st, 2nd and 3rd jobs. Use some hacks like this to run jobs with different configs from single JVM. But I'm afraid that's not very stable (officially supported by Spark team if you want) solution. [Spark Job Server][5, but as I understand it's an implementations of the first solution Update Looks like 2nd option (global session with all resources + fair thread pool) isn't possible due to the fact you can configure only number of cores at pool.xml (minShare), but can't memory per executor.
Related Links
how inject a EntityManager between ear
How to generate a random String in Java [duplicate]
Retrieve time portion of date [closed]
Do Java programs ever crash?
Migrate from Tomcat to WAS
How do I access a JavaFX 1.3 static class member from Java?
How to program game of Go, Baduk, Weiqi in java
implementation of interp1 function of MATLAB in J2ME
Is it allowed to load Swing classes in non-EDT thread?
how to add ctrl - click listeners in eclipse java editor
Trying to sentinel loop this program [closed]
No endpoint mapping found for…, using SpringWS, JaxB Marshaller
Parsing a string to date gives 01/01/0001 00:00:00
Eclipse building - exclude from jar packaging but include into final product
Weblogc BEA-000449 : Closing socket as no data read from it during the configured idle timeout of 5 secs
Maven: trigger custom command when Build is finished, dependent on outcome (successful/failed)