Skip to content
Snippets Groups Projects
Commit 0112d2ab authored by Kyle Bringmans's avatar Kyle Bringmans
Browse files

removed bug with mismatching datatypes

parent 86467e1b
No related branches found
No related tags found
1 merge request!24Add ipoc resampling
......@@ -136,10 +136,11 @@ class DataProcessor(ABC):
# resample to seconds
seconds = [t1 + step * x for x in range(int((t2 - t1) / step) + 1)]
time = [t for t in seconds if t in ts]
print(seconds, file=sys.stderr)
# only keep ipoc timestamps
return time
# map timestamps to int because otherwise data_range filters all entries becuase int is not in list of strings
timestamps = [int(t) for t in timestamps]
# udf to generate upsampled timestamps
date_range_udf = udf(lambda x, y: date_range(x, y, timestamps, step), ArrayType(LongType()))
# get upper limit of timestamps for final timestamp in dataframe (handles edge case for last timestamp)
......@@ -155,7 +156,6 @@ class DataProcessor(ABC):
# join dataframes on timestamp index (rounded to nearest second)
self.data = self.data.dropDuplicates([_dfindex])
# Remove old timestamps if they did not align with ipoc timestamps
self.data = self.data.where(f.col(_dfindex).isin(timestamps))
@staticmethod
......
  • Contributor
    • this fixed the bug which caused all rows to be thrown away
    • works on data from 1 month
    • testing for 1 year (2017)
  • Contributor
    • Get core dump for resample of 1 year

    • 1 month gives BroadcastTimeout -> rerunning with timeout turned off

  • Contributor

    Retrieved 9 months of data (in 1 month segments)

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment