Skip to content

WIP: Remember to add query when opening file

Piotr Mrowczynski requested to merge fix-svc into qa

Note

**WARNING: THIS WONT WORK FOR HDFS-DFS/SPARK - REQUIRES REFACTOR **

Description

This is to fix .open call in https://gitlab.cern.ch/db/cerndb-hadoop-backup/-/blob/qa/eosarchive/eosarc.py#L523-526

How tested?

Build connector

$ docker build \
-t hadoop-xrootd-connector ./docker
  
$ docker run --rm -it -v $(pwd):/build hadoop-xrootd-connector bash
 
make package

Start spark in debug mode

kinit hbackup@CERN.CH
 
export HADOOP_ROOT_LOGGER=hadoop.root.logger=DEBUG,console
export Xrd_debug=1
export SPARK_DIST_CLASSPATH="$(pwd)/*:$(hadoop classpath)"
 
pyspark \
--master local[*] \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/root/log4j.properties" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/root/log4j.properties"

Try to read


fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(
    sc._jvm.java.net.URI("root://castorpublic.cern.ch"),
    sc._jsc.hadoopConfiguration(),
)

file = fs.open(sc._jvm.org.apache.hadoop.fs.Path('root://castorpublic.cern.ch//castor/cern.ch/hdfsback/nxcals_prod/1542419455881/part-000012?svcClass=backup'))

file.read(0, bytearray(), 0, 1)

We should see that C++ handles ?svcClass properly!!

20/05/22 16:17:20 DEBUG HadoopXRootD: EOSfs open root://castorpublic.cern.ch//castor/cern.ch/hdfsback/nxcals_prod/1542419455881/part-000012?svcClass=backup with readBufferSize=131072
initFile: File structure created, addr=0x7fe37cb53e20
openFile 140615026556448: 'root://castorpublic.cern.ch//castor/cern.ch/hdfsback/nxcals_prod/1542419455881/part-000012?svcClass=backup'
openFile: flags 0 accessmode 0
readFile 3776157455237040: status [SUCCESS]  bytes read 1
XrdClFile.Read 140615026556448 return 1 bytes
20/05/22 16:17:21 DEBUG HadoopXRootD: EOSInputStream.read(pos=0, b, off=0, len=1) readBytes: 1 elapsed: 16569
1
Edited by Prasanth Kothuri

Merge request reports