Below are some notes taken for future reference based on the brainstorm meeting last week, with company confidential information removed.BackgroundThe team use a home made workflow to manage the computation for the cost and profit, and there’s a lack of statistics for the jobs and input/output, usually SDE/oncall checks the data in Data Warehouse or TSV files on S3 manually. For EMR jobs, Spark UI and Ganglia are both powerful but when the clusters are terminated, all these valuable metrics data are gone.Typical use cases:Spark metrics: status / efficiency / executor / GC …EMR clus
...
继续阅读
(49)