Randomly Populating Pyspark Columns
Sometimes we need to create synthetic data for testing, the following is a snippet on how to create a new column with randomly populated discrete values
from pyspark.sql import functions as F
df.withColumn(
"business_vertical",
F.array(
F.lit("Retail"),
F.lit("SME"),
F.lit("Cor"),
).getItem(
(F.rand()*3).cast("int")
)
)