Skip to content

Ensure hdf file is created with HDFStore

Jonas Eschle requested to merge proper_hdf_init into master

Short version and solution

Using HDFStore(filename, mode='a').append() requires that the file was created with HDFStore itself and not e.g. open(file). Solution: To ensure that, a file

  • that already exists and
  • "contains" a frame of length 0

is assumed to be "non-existent" (not created with HDFStore) and mode='w'.

Longer explanation

A test in test_data.py passed in py2 but failed in py3 due to a very internal, pytables error. This fail comes from creating a file without using the HDFStore constructor, or further below a very similar case to ours. "Luckily", this failure comes from a function inside the tests, so probably inside the package we do not violate this.

The creation with another constructor could also happen somewhere when creating a file with h5py (pd.HDFStore used PyTables internally, they are not (yet) compatible. Therefore also !45 (merged) (removes h5py as dependency).

Python versions: luckily this failed loudly in python 3 (on a trivial tuple, tuple comparison somewhere deep inside pytables... :S). It has to be assumed that this failed weirdly, silently in python 2.

On the fix: the golden solution would be to "check whether this file was created with HDFStore". I did not figure out a good way to achieve that, though the current solution should work under "reasonable" conditions.

Sidenote

I am not sure if this is somehow related or a completely different issue, but for some reasons test_load_with_weights in test_data.py passes (after the fix) seamless on my local machine (Ubuntu 17.10, conda env similar to CI setup) with py2 and py3 but fails on the CI machines (py2 is simply not responding anymore whereas py3 creates a core dump originating from an unrecoverable stack overflow inside PyTables. Hunting the problem down, it seems as if it has to do with the get_data function (or the way we create a tempfile).

This is probably completely unrelated! (for example, it may has to do with the tempfile creation first)
But still, something is at least fishy about that...

What do you think? Merge this one?

Edited by Jonas Eschle

Merge request reports