Skip to content Skip to sidebar Skip to footer

Rename Columns With Special Characters In Python Or Pyspark Dataframe

I have a data frame in python/pyspark. The columns have special characters like dot(.) spaces brackets(()) and parenthesis {}. in their names. Now I want to rename the column names

Solution 1:

If you are having a pyspark dataframe, you can try using withColumnRenamed function to rename the columns. I did try in my way, have a look and customize it for your changes.

>>>l=[('some value1','some value2','some value 3'),('some value4','some value5','some value 6')]>>>l_schema = StructType([StructField("col1.some valwith(in)and{around}",StringType(),True),StructField("col2.some valwith()and{}",StringType(),True),StructField("col3 some()valwith.and{}",StringType(),True)])>>>reps=('.','_'),(' ','_'),('(',''),(')',''),('{','')('}','')>>>rdd = sc.parallelize(l)>>>df = sqlContext.createDataFrame(rdd,l_schema)>>>df.printSchema()
root
 |-- col1.some valwith(in)and{around}: string (nullable = true)
 |-- col2.some valwith()and{}: string (nullable = true)
 |-- col3 some()valwith.and{}: string (nullable = true)

>>>df.show()
+------------------------+------------------------+------------------------+
|col1.some valwith(in)and{around}|col2.some valwith()and{}|col3 some()valwith.and{}|
+------------------------+------------------------+------------------------+
|             some value1|             some value2|            some value 3|
|             some value4|             some value5|            some value 6|
+------------------------+------------------------+------------------------+

>>>defcolrename(x):...return reduce(lambda a,kv : a.replace(*kv),reps,x)>>>for i in df.schema.names:...   df = df.withColumnRenamed(i,colrename(i))>>>df.printSchema()
root
 |-- col1_some_valwithinandaround: string (nullable = true)
 |-- col2_some_valwithand: string (nullable = true)
 |-- col3_somevalwith_and: string (nullable = true)

>>>df.show()
+--------------------+--------------------+--------------------+
|col1_some_valwithinandaround|col2_some_valwithand|col3_somevalwith_and|
+--------------------+--------------------+--------------------+
|                 some value1|         some value2|        some value 3|
|                 some value4|         some value5|        some value 6|
+--------------------+--------------------+--------------------+

Solution 2:

Python 3.x solution:

tran_tab = str.maketrans({x:Nonefor x inlist('{()}')})

df1 = df.toDF(*(re.sub(r'[\.\s]+', '_', c).translate(tran_tab) for c in df.columns))

Python 2.x solution:

df1 = df.toDF(*(re.sub(r'[\.\s]+', '_', c).translate(None, '(){}') for c in df.columns))

Post a Comment for "Rename Columns With Special Characters In Python Or Pyspark Dataframe"