Dynamically select columns by type

Feb 22, 2019·

Dr. Georg Heiler

· 1 min read

In pandas it is really easy to select only columns matching a certain data type:

df.select_dtypes(include=['float64'])

In spark, such a function is not included by default. However, it can easily be coded by hand:

val df  =  Seq(
  (1, 2, "hello")
).toDF("id", "count", "name")

import org.apache.spark.sql.functions.col
def selectByType(colType: DataType, df: DataFrame) = {

  val cols = df.schema.toList
    .filter(x => x.dataType == colType)
    .map(c => col(c.name))
  df.select(cols:_*)

}
val res = selectByType(IntegerType, df)

Last updated on Feb 23, 2026

Apache-Spark Big-Data

Authors

Dr. Georg Heiler

senior data expert

Georg is a Senior data expert at Magenta and a ML-ops engineer at ASCII. He is solving challenges with data. His interests include geospatial graphs and time series. Georg transitions the data platform of Magenta to the cloud and is handling large scale multi-modal ML-ops challenges at ASCII.

← Data links KW 8 Feb 23, 2019

Data links KW 7 Feb 15, 2019 →

No results found

Dynamically select columns by type