Skip to content Skip to sidebar Skip to footer

Pcollection To Array - How To Dynamically Input A Header Into A Writetotext Ptransform?

I am writing a dataflow job using Apache beam 2.19 running on the Dataflow runner primarily. I am attempting to Transform a BigQuery input with nested and repeated fields to a flat

Solution 1:

Unfortunately this is currently unsupported. File headers can only be specified at pipeline construction time, so the best solution at the moment is to try to generate the header you need at pipeline construction time instead of execution time.

That said, you may be able to "cheat" this in a way to get the same result. For example, you could write a CombineFn that combines all your input elements to the TextIO into a single string containing the CSV body. Then send that to a ParDo that takes the dictionary keys as a side input and appends them to the beginning of your CSV body as a header, and finally sends that string representing your whole file to your TextIO transform.

To reiterate, that's a bit of a cheat to get around the lack of support, and it's probably more brittle and less performant than a natively supported dynamic header would be. If you are able to avoid the issue by generating the header at pipeline construction time instead, that's far better.

Post a Comment for "Pcollection To Array - How To Dynamically Input A Header Into A Writetotext Ptransform?"