spark feature gives "no module named 'bamboo' "
When running bamboo with the spark job distribution feature, I received ModuleNotFoundError: No module named 'bamboo'
. This is due to the fact that I was using an editable installation of bamboo. It seems this type of installation links the package to the original location, which makes spark unaware of the path. Spark in this case sees only what's available in the virtual environment, not outside.
One solution can be installing the bamboo in the usual way, if one doesn't really need editable installation. Another option is setting the PYTHONPATH in the machine.
Or, a way of installing bamboo in the spark worker nodes can be possible as indicate here, which may require some additions in the bamboo code.
Edited by Oguz Guzel