pandas запутывает чтение csv, когда запятые заключены в кавычки

#python #pandas #dataframe

#python #pandas #фрейм данных

Вопрос:

 col1, col2, geometry
11.54000000,0.00000000,"{"type":"Polygon","coordinates":[[[-61.3115751786311,-33.83968838375797],[-61.29737019968823,-33.83207774370677],[-61.29443049860791,-33.83592770721248],[-61.29241347742871,-33.83489393774538],[-61.28994584513501,-33.83806650089736],[-61.292499308117186,-33.83938539699006],[-61.28958106470898,-33.8431993873636],[-61.29307859612687,-33.84495487100211],[-61.295256567865046,-33.846135537383866],[-61.296388484054326,-33.84676149889543],[-61.296747927196776,-33.84651421268175],[-61.297498943449426,-33.84670133707654],[-61.297992472179686,-33.847120134589964],[-61.299741220055196,-33.84901812154847],[-61.3012164422457,-33.85018089588664],[-61.3015892874819,-33.850566250375365],[-61.30284190607861,-33.85079121660985],[-61.30496105223345,-33.848193766906206],[-61.306084952130036,-33.84682375029292],[-61.30707604410075,-33.845532812572294],[-61.30672627175046,-33.84527169005647],[-61.306290670206494,-33.845188781884744],[-61.304604048903514,-33.847304098561025],[-61.30309763921784,-33.84654473836309],[-61.30013213880613,-33.84478736144466],[-61.30110629620797,-33.8431690707163],[-61.303046037678854,-33.844170576767105],[-61.30433047221653,-33.84266156764314],[-61.30484242472771,-33.842899106713375],[-61.30696068650711,-33.844104878773436],[-61.306418212892446,-33.84505221083753],[-61.307163201216696,-33.845464893960255],[-61.30760172622554,-33.84490909256552],[-61.307932962646014,-33.844513681420494],[-61.309176116985405,-33.84280834206188],[-61.30596211112515,-33.841126948963954],[-61.3056475423994,-33.841449215098756],[-61.30526859890979,-33.841557611902374],[-61.30483601097522,-33.84149669494795],[-61.30448925534122,-33.84120408616046],[-61.30410688411086,-33.840609953572034],[-61.30400151682434,-33.839925243738094],[-61.30240379835875,-33.83889223688216],[-61.30188418287129,-33.838444480832685],[-61.301130848179525,-33.83943255499186],[-61.30078636095504,-33.83996223583909],[-61.30059265818967,-33.84016469670277],[-61.30048478527255,-33.840438447848506],[-61.300252198180424,-33.84026774340676],[-61.29876711207748,-33.839489883020924],[-61.29799408649143,-33.840597902688785],[-61.297669258508,-33.84103160870988],[-61.297566592962134,-33.84112444052047],[-61.29748538503245,-33.841083604060834],[-61.297140578061956,-33.84134946797752],[-61.29709617977233,-33.84160419097128],[-61.297170540239335,-33.84168254110631],[-61.297341460506956,-33.84179653572337],[-61.297243418161194,-33.84197105818567],[-61.29699517169225,-33.84200300239938],[-61.29680176950715,-33.84179064473802],[-61.29691703393983,-33.8416707218475],[-61.297053755769845,-33.841604265738546],[-61.29707920124143,-33.84154875978832],[-61.29709391784669,-33.84147543150246],[-61.29711262215961,-33.84133768608576],[-61.296951411710374,-33.84119216012805],[-61.297262269660294,-33.84089514360839],[-61.297626491077864,-33.84051497848962],[-61.29865532547658,-33.83935363544152],[-61.30027710358755,-33.84011486145675],[-61.30046658230606,-33.83996490243917],[-61.30063460268783,-33.83979712050095],[-61.300992098665965,-33.8393813535522],[-61.301799802937595,-33.83832425565103],[-61.30135527704997,-33.837671541923235],[-61.30082030025984,-33.83731962483044],[-61.299512855628244,-33.83689640801839],[-61.29879550338594,-33.8363083288346],[-61.29831419490918,-33.835559835856905],[-61.298360098160686,-33.83408067231082],[-61.29976541168753,-33.83467181800819],[-61.30104200723692,-33.83586895614681],[-61.30133434017162,-33.83606352507277],[-61.30153415160492,-33.836339043812224],[-61.30164813329583,-33.83657891551336],[-61.30124575062752,-33.83743146168004],[-61.30195917352424,-33.83831965157767],[-61.30196183786503,-33.83843401993221],[-61.30250094586367,-33.83890484694379],[-61.304002690127376,-33.83984352469762],[-61.30473149692381,-33.8397514189025],[-61.3054487998093,-33.839941491549894],[-61.30582354557356,-33.84016574092716],[-61.30604808932503,-33.84046128014441],[-61.306143888278996,-33.840801374736316],[-61.30598219492593,-33.841088001849094],[-61.30757239940571,-33.841967156609876],[-61.30920555104759,-33.84277500140921],[-61.3115751786311,-33.83968838375797],[-61.3115751786311,-33.83968838375797]]]}"
  

Как мне прочитать csv с синтаксисом, подобным приведенному выше?

Я делаю:

 import pandas as pd
df = pd.read_csv('file.csv')
  

Однако read_csv путается с , внутри "{"type":"Polygon","coordinates": , я хочу, чтобы он игнорировал , внутри кавычек.

Комментарии:

1. Я думаю, это вызвано неправильным форматом вашего файла csv. Двойная кавычка («) в ячейке должна быть экранирована на this («»).

2. Каков источник этого плохо отформатированного CSV?

3. Мой ответ решил вашу проблему? Если нет, пожалуйста, проследите за этим, чтобы можно было устранить любые нерешенные проблемы. Спасибо

Ответ №1:

Ваш csv-файл содержит мультииндекс, который вызывает проблемы с чтением и разделением.

Я перепробовал несколько методов, чтобы правильно прочитать ваш файл. Лучший метод, который я нашел до сих пор, — это использование движка Python с расширенным разделителем в функции read_csv.

 import pandas as pd

# these are for viewing the output
pd.set_option('display.max_columns', 30)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 120)

# The separator matches the format of the string that you provided.
# I'm sure that it can be modified to be more efficient.
df = pd.read_csv('test.csv', skiprows=1, sep='(d{1,2}.d{1,8}),(d{1,2}.d{1,8}),("{"type":.*)',engine="python")

# some cleanup
df = df.drop(df.columns[0], axis=1)

# I had to save the processed file
df.to_csv('test_01.csv')

# read in the new file
df = pd.read_csv('test_01.csv', header=None, index_col=0)
print(df.to_string(index=False))
11.54  0.0  "{"type":"Polygon","coordinates":[[[-61.3115751786311,-33.83968838375797],[-61.29737019968823,-33.83207774370677],[-61.29443049860791,-33.83592770721248],[-61.29241347742871,-33.83489393774538],[-61.28994584513501,-33.83806650089736],[-61.292499308117186,-33.83938539699006],[-61.28958106470898,-33.8431993873636],[-61.29307859612687,-33.84495487100211],[-61.295256567865046,-33.846135537383866],[-61.296388484054326,-33.84676149889543],[-61.296747927196776,-33.84651421268175],[-61.297498943449426,-33.84670133707654],[-61.297992472179686,-33.847120134589964],[-61.299741220055196,-33.84901812154847],[-61.3012164422457,-33.85018089588664],[-61.3015892874819,-33.850566250375365],[-61.30284190607861,-33.85079121660985],[-61.30496105223345,-33.848193766906206],[-61.306084952130036,-33.84682375029292],[-61.30707604410075,-33.845532812572294],[-61.30672627175046,-33.84527169005647],[-61.306290670206494,-33.845188781884744],[-61.304604048903514,-33.847304098561025],[-61.30309763921784,-33.84654473836309],[-61.30013213880613,-33.84478736144466],[-61.30110629620797,-33.8431690707163],[-61.303046037678854,-33.844170576767105],[-61.30433047221653,-33.84266156764314],[-61.30484242472771,-33.842899106713375],[-61.30696068650711,-33.844104878773436],[-61.306418212892446,-33.84505221083753],[-61.307163201216696,-33.845464893960255],[-61.30760172622554,-33.84490909256552],[-61.307932962646014,-33.844513681420494],[-61.309176116985405,-33.84280834206188],[-61.30596211112515,-33.841126948963954],[-61.3056475423994,-33.841449215098756],[-61.30526859890979,-33.841557611902374],[-61.30483601097522,-33.84149669494795],[-61.30448925534122,-33.84120408616046],[-61.30410688411086,-33.840609953572034],[-61.30400151682434,-33.839925243738094],[-61.30240379835875,-33.83889223688216],[-61.30188418287129,-33.838444480832685],[-61.301130848179525,-33.83943255499186],[-61.30078636095504,-33.83996223583909],[-61.30059265818967,-33.84016469670277],[-61.30048478527255,-33.840438447848506],[-61.300252198180424,-33.84026774340676],[-61.29876711207748,-33.839489883020924],[-61.29799408649143,-33.840597902688785],[-61.297669258508,-33.84103160870988],[-61.297566592962134,-33.84112444052047],[-61.29748538503245,-33.841083604060834],[-61.297140578061956,-33.84134946797752],[-61.29709617977233,-33.84160419097128],[-61.297170540239335,-33.84168254110631],[-61.297341460506956,-33.84179653572337],[-61.297243418161194,-33.84197105818567],[-61.29699517169225,-33.84200300239938],[-61.29680176950715,-33.84179064473802],[-61.29691703393983,-33.8416707218475],[-61.297053755769845,-33.841604265738546],[-61.29707920124143,-33.84154875978832],[-61.29709391784669,-33.84147543150246],[-61.29711262215961,-33.84133768608576],[-61.296951411710374,-33.84119216012805],[-61.297262269660294,-33.84089514360839],[-61.297626491077864,-33.84051497848962],[-61.29865532547658,-33.83935363544152],[-61.30027710358755,-33.84011486145675],[-61.30046658230606,-33.83996490243917],[-61.30063460268783,-33.83979712050095],[-61.300992098665965,-33.8393813535522],[-61.301799802937595,-33.83832425565103],[-61.30135527704997,-33.837671541923235],[-61.30082030025984,-33.83731962483044],[-61.299512855628244,-33.83689640801839],[-61.29879550338594,-33.8363083288346],[-61.29831419490918,-33.835559835856905],[-61.298360098160686,-33.83408067231082],[-61.29976541168753,-33.83467181800819],[-61.30104200723692,-33.83586895614681],[-61.30133434017162,-33.83606352507277],[-61.30153415160492,-33.836339043812224],[-61.30164813329583,-33.83657891551336],[-61.30124575062752,-33.83743146168004],[-61.30195917352424,-33.83831965157767],[-61.30196183786503,-33.83843401993221],[-61.30250094586367,-33.83890484694379],[-61.304002690127376,-33.83984352469762],[-61.30473149692381,-33.8397514189025],[-61.3054487998093,-33.839941491549894],[-61.30582354557356,-33.84016574092716],[-61.30604808932503,-33.84046128014441],[-61.306143888278996,-33.840801374736316],[-61.30598219492593,-33.841088001849094],[-61.30757239940571,-33.841967156609876],[-61.30920555104759,-33.84277500140921],[-61.3115751786311,-33.83968838375797],[-61.3115751786311,-33.83968838375797]]]}" 
  

Ответ №2:

Попробуйте это:

 pd.read_csv('file.csv',quotechar='"',skipinitialspace=True)