-
David Smith authoredDavid Smith authored
What is MockData for?
MockData is a set of tools which may make it easier to see how an XRootD endpoint reacts to different types of read load.
The endpoint is intended to be an XCache server, although any XRootD endpoint could work.
MockData transfers files from the endpoint, and when using its own specially formatted files MockData will also verify file integrity. The system allows ingestion of lists of time/file/filesize and the assignment of these to a variable number of workers (mockdata-loadgenerators) which can be deployed to allow for large volumes of traffic to be generated.
Variable types of file access can be specified and parametrised using regular expressions on the filename. The distribution of read block sizes, how much seeking is done within the file and overall how much of the file is read can be configured. The amount of Vector Read, the number of XRootD sub streams, the number of overlapping IO requests for file, and the number of files per XRootD client stream can also be specified.
The desired file transfer rate or duration is given by the user. If the MockData server plugin is used, data files are generated with predictable content allowing integrity validation and avoiding the need to store a large number of files. When using the MockData plugin a latency can also be specified, allowing a basic approximation of remote access to be made.
Results include the total amount of time waiting the client spent for IOs, allowing a measure of transfer efficiency to be made.
Why would I need MockData?
If you would like to investigate performance of Xcache or Xcache and a given hardware configuration MockData may be of use.
Quick Start
It is assumed you have:
-
An Xcache server you would like to investigate.
-
A dedicated XRootD server endpoint to be used as the origin for the data to the fetched via the Xcache. The dedicated XRootD server does not need any storage for data. Your Xcache could be configured as a direct mode proxy with this server as its origin. Alternatively your Xcache could support forwarding mode and this XRootD server may be selected as each file is opened.
Setup
-
Download the mockdata-client and mockdata-coordinator RPMs and install on a client machine. The mockdata-client RPM requires some XRootD client packages so they will have to be installed too.
-
Also download and install the mockdata-plugin RPM on your dedicated XRootD server.
Run the XRootD server using the libXrdMockDataOss.so. A default configuration will be installed at /etc/xrootd/xrootd-mockdata.cfg and the service may then be started with
systemctl restart xrootd@mockdata
On your client machine you will use the following two components:
- /usr/bin/mockdata-coordinator is called the coordinator
- /usr/bin/mockdata-loadgenerator is called the load-generator
In one window run the load-generator like this:
/usr/bin/mockdata-loadgenerator <client_hostname:50051>
where client_hostname is the hostname of your client machine.
In another window you will prepare a configuration and use a default file-list and then run the coordinator. Create a working directory and copy and the default configuration files:
cd working_directory
cp /usr/share/doc/mockdata-*/example_config/config.yaml .
cp /usr/share/doc/mockdata-*/example_config/filelist.txt .
Edit the config.yaml file and change the line targethost: to refer to your Xcache. If you need your Xcache to use direct mode set a suitable pathprefix, e.g.:
targethost: xcache_endpoint
pathprefix: /mockdata
If your Xcache will use forwarding mode set the prefix such as:
targethost: xcache_endpoint
pathprefix: /root://xrootd_server_hostname//mockdata
The default filelist.txt contain only 1 uncommented file in the list. Run the coordinator:
/usr/bin/mockdata-coordinator
by default it will look for the configuration in config.yaml and the file-list in filelist.txt. The mockdata-loadgenerator will give some output, but it is intended that all the output required is given in the output of the mockdata-coordinator. Typical output is:
+++++++++++++++++++++++++
Time scaling factor: 1
Continuous repated replay of file list: no
Skip until time: 0
Filelist filename: filelist.txt
Configuration filename: config.yaml
+++++++++++++++++++++++++
Nov 29 11:47:24.370059 pmpe01 [DEBUG] Sent configuration to load-generator pmpe01.cern.ch#31790 v1.0.0-1
Nov 29 11:47:24.370539 pmpe01 [INFO] Starting replay, reference time 1575024459
Nov 29 11:47:24.370640 pmpe01 [INFO] Setting the startpoint timestamp from the filelist to 1567502522
Nov 29 11:47:34.730485 pmpe01 [INFO] Assigning id=0 originalStartTime=1567502522 filename=myfile1.txt intended_duration=0.000102 allowed_overrun=1.000000/600.000000 approx_start=5.000000 lgidstr=pmpe01.cern.ch#31790 dutycycle=0.000000 prev_nassigned_lg=0 nassigned_total=1 assignRateFiveMinAvgMBs=0.000000
Nov 29 11:47:39.375672 pmpe01 [INFO] Received success for myfile1.txt id=0 lgidstr=pmpe01.cern.ch#31790 result_rc=0 result_summary=Final stats for fid=0 fileUrl=xroot://caaaa000@pmpe06.cern.ch//mockdata/myfile1.txt_1024_0 intended_duration=0.000102 nbissued=1024 nread/vecread_bytes=1024/0 nread/vecread_calls=1/0 duration=0.004259 accessMap=|********************| seekPdfFit=0 nseekbytes=0 nseeks=0 seekedFracPerSeek=0 seeksPerMB=0.000000 Vnchunks/Vnread=0 fsize=1024 lastServer=caaaa000@pmpe15.cern.ch:1095 xrdReqCallbackTime=0.000353 xrdReqOverlapFactor=1.000000 deliveryEff=22.493284 globalRateEstimateMBps=0.000068
Nov 29 11:47:40.741207 pmpe01 [INFO] Finished the filelist, coordinator exiting
The typical pattern is that files from the file list are assigned to a load generator. This is the "Assigning" line above at time stamp 11:47:34.730485. At a later time the completion for the file is received, this is the "Received success" line at timestamp 11:47:39.375672.
Filelist.txt format
The filelist.txt can be filled with a list of files in the following format:
# lines starting like this are ignored
<timestamp> <filename> <file size>
The timestamp is an integer (and must be a positive, signed 32 bit integer). Typically it is a unix timestamp, with arbitrary starting point. The timestamps must be increasing (or the same) from the start to the end of the file.
Starting transfer of files by timestamp or rate
By default the coordinator will schedule the files for transfer assuming the timestamp unit is seconds. i.e. if the second file has a timestamp +10 compared to the first one the second transfer will be started 10 seconds after the first. A scale factor may be given to the coordinator with the "-f" option which affects this. e.g. with a scale factor of 2.0 the second file in this example would be started 5 seconds after the first.
mockdata-coordinator -f 2.0
The coordinator may also replay transfers in a "smoothed" mode; e.g. specify "-f 100,500,5000" which will replay files from the filelist such that the transfer rate needed to transfer the files is 100MB/s, and increases linearly to 500MB/s over a duration of 5000s. There must be enough files in the filelist to provide the required traffic, taking into account the file access profiles which can specify "frac_bytes_read" such that filesize*factor of the bytes will be read. If file transfers fail after assigning, without transferring the expected amount of data the observed network rate will be less than specified in the smoothing parameters.
mockdata-coordinator -f 100,500,5000
Scaling considerations
Deploy enough mockdata-loadgenerators to cover the amount of traffic you could like to generate. With a small block size, e.g. 25KB, assume each load generator can support:
600MB/s of traffic
200 concurrent transfers
Occupy 4 HT cores of an Intel E5-2630v3 2.4GHz CPU
larger block sizes will reduce the demands.