Tagged: etl

Apache Spark: Convert CSV to RDD

Below is a simple Spark / Scala example describing how to convert a CSV file to an RDD and perform some simple filtering. This example transforms each line in the CSV to a Map with form header-name -> data-value. Each map key corresponds to a header name, and each data value corresponds the value of that key the specific line. This particular example also assumes that the header information is...