4/25/2023 0 Comments Convert db3 file csvWe can now read CSV files directly into PyArrow Tables using _csv. I haven't benchmarked this code against the Apache Drill version, but in my experience it's plenty fast, converting tens of thousands of rows per second (this depends on the CSV file of course!). Table = pa.om_pandas(chunk, schema=parquet_schema) Parquet_writer = pq.ParquetWriter(parquet_file, parquet_schema, compression='snappy') Parquet_schema = pa.om_pandas(df=chunk).schema # Guess the schema of the CSV file from the first chunk Or using conda: conda install pandas pyarrow -c conda-forgeĬonvert CSV to Parquet in chunks # csv_to_parquet.pyĬsv_stream = pd.read_csv(csv_file, sep='\t', chunksize=chunksize, low_memory=False) However, if you are familiar with Python, you can now do this using Pandas and PyArrow! Install dependencies I already posted an answer on how to do this using Apache Drill.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |