

Or if you want both use a rollup as this will give the result of both of the above in a single aggregation and output. Or as Igor mentioned you could do a groupby to get a more granular breakdown df.groupBy('status').agg(*cols_to_agg) |min(column1)|max(column1)| avg(column1)|min(column2)|max(column2)| avg(column2)|min(column3)|max(column3)| avg(column3)|min(column4)|max(column4)| avg(column4)| It helps if you specify the output you want in your question or what you'll be using the output for, but the below should cover most use cases from pyspark.sql import functions as FĬols_to_agg = ] |min(Status)|max(Status)|avg(Status)|min(column1)|max(column1)| avg(column1)|min(column2)|max(column2)| avg(column2)|min(column3)|max(column3)| avg(column3)|min(column4)|max(column4)| avg(column4)| You can explicitly use select expressions to get the results. Df.describe() uses native Spark function to run the computation.
