scala - How do I filter rows based on whether a column value is in a Set of Strings in a Spark DataFrame -
is there more elegant way of filtering based on values in set of string?
def myfilter(actions: set[string], mydf: dataframe): dataframe = { val containsaction = udf((action: string) => { actions.contains(action) }) mydf.filter(containsaction('action)) }
in sql can do
select * mytable action in ('action1', 'action2', 'action3')
how this:
mydf.filter("action in (1,2)")
or
import org.apache.spark.sql.functions.lit mydf.where($"action".in(seq(1,2).map(lit(_)):_*))
or
import org.apache.spark.sql.functions.lit mydf.where($"action".in(seq(lit(1),lit(2)):_*))
Comments
Post a Comment