Python about Pickle & HDF5 format | 파이썬 pickle 과 HDF5 파일 저장 형식

* Python Data를 pickle format을 활용하여 저장할 경우 버젼별로 다르게 나오기 때문에 라이브러리 버젼의 차이로 Data무결성이 보장되지 않는다.

* When saving Python data using pickle format, the data integrity is not guaranteed due to the difference in the library version

* HDF5(Hierarchical Data Format) 형식

* HDF5

A format designed for storing array data for large-scale scientific calculations.
Provides C, Java, Julia, Matlab, and Python interfaces
Provides on-the-fly compression => more efficient storage of patterned data.
Store using Pandas: pd.HDFStore('File_Name')
When loading data: pd.read_hdf('File_Name', 'Schema_Name', format='table or 'fixed')
HDF5 is not a database. It is optimized for data that is written once and read frequently. The performance will be decreased if it needs to be written and read frequently
Data can be added to a file, but the file may be broken if you add it to multiple places simultaneously.
If your data analysis depends on IO performance rather than CPU, you can improve performance with the file format.

728x90

Configure a conda virtual environment - PyCharm Error \| 파이참에서 아나콘다 가상환경 가져오기 에러 (3)	2023.09.06
DataFrame.info() got an unexpected keyword argument 'null_counts' \|\| Deprecated argument - null_counts (0)	2023.08.21
create conda env by yml file in windows \| 윈도우에서 yml파일로 아나콘다 환경설정 하기 (0)	2023.07.26
Gensim Phrases updated usage \| Gensim 변경된 사용법 (0)	2023.07.06
Pandas resample for date control(W-MON or W-FRI) \| 판다스 리샘플링 (1)	2023.07.06

OliverHouse