Tagged: etl

Apache Spark: Convert CSV to RDD

Below is a simple Spark / Scala example describing how to convert a CSV file to an RDD and perform some simple filtering. This example transforms each line in the CSV to a Map with form header-name -> data-value. Each map key corresponds to a header name, and each data value corresponds the value of that key the specific line. This particular example also assumes that the header information is...


Get every new post on this blog delivered to your Inbox.

Join other followers: