WIP: Remember to add query when opening file
Note
**WARNING: THIS WONT WORK FOR HDFS-DFS/SPARK - REQUIRES REFACTOR **
Description
This is to fix .open
call in https://gitlab.cern.ch/db/cerndb-hadoop-backup/-/blob/qa/eosarchive/eosarc.py#L523-526
How tested?
Build connector
$ docker build \
-t hadoop-xrootd-connector ./docker
$ docker run --rm -it -v $(pwd):/build hadoop-xrootd-connector bash
make package
Start spark in debug mode
kinit hbackup@CERN.CH
export HADOOP_ROOT_LOGGER=hadoop.root.logger=DEBUG,console
export Xrd_debug=1
export SPARK_DIST_CLASSPATH="$(pwd)/*:$(hadoop classpath)"
pyspark \
--master local[*] \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/root/log4j.properties" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/root/log4j.properties"
Try to read
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(
sc._jvm.java.net.URI("root://castorpublic.cern.ch"),
sc._jsc.hadoopConfiguration(),
)
file = fs.open(sc._jvm.org.apache.hadoop.fs.Path('root://castorpublic.cern.ch//castor/cern.ch/hdfsback/nxcals_prod/1542419455881/part-000012?svcClass=backup'))
file.read(0, bytearray(), 0, 1)
We should see that C++ handles ?svcClass
properly!!
20/05/22 16:17:20 DEBUG HadoopXRootD: EOSfs open root://castorpublic.cern.ch//castor/cern.ch/hdfsback/nxcals_prod/1542419455881/part-000012?svcClass=backup with readBufferSize=131072
initFile: File structure created, addr=0x7fe37cb53e20
openFile 140615026556448: 'root://castorpublic.cern.ch//castor/cern.ch/hdfsback/nxcals_prod/1542419455881/part-000012?svcClass=backup'
openFile: flags 0 accessmode 0
readFile 3776157455237040: status [SUCCESS] bytes read 1
XrdClFile.Read 140615026556448 return 1 bytes
20/05/22 16:17:21 DEBUG HadoopXRootD: EOSInputStream.read(pos=0, b, off=0, len=1) readBytes: 1 elapsed: 16569
1