Ошибка при попытке загрузить файлы process .json в фрейм данных pandas

#json #python-3.x #dataframe

#json #python-3.x #фрейм данных

Вопрос:

Я пытаюсь объединить различные.файлы json, чтобы я мог позже выполнить для них анализ настроений. Я уже пробовал другие подходы, но они всегда заканчивались ошибкой. Я проверил, является ли .json правильно отформатирован и не может найти там никаких проблем. Я также прикрепил пример a .файл json.

Сообщение об ошибке прикреплено под моим кодом.

 import glob
import json

# list all files containing News from Guardian API
files = list(glob.iglob('/Users/xxx/tempdata/articles_data/*.json'))

news_data = []
for file in files:
    
    news_file = open(file, "r", encoding = 'utf-8')

    # Read in news and store in list: news_data
    for line in news_file:
        news = json.loads(line)
        news_data.append(news)

    news_file.close()
  

Обновлен вывод ошибки

     AttributeError                            Traceback (most recent call last)
<ipython-input-86-3019ee85b15b> in <module>
     12     # Read in news and store in list: news_data
     13     for line in news_file:
---> 14         news = json.load(line)
     15         news_data.append(news)
     16 

~/opt/anaconda3/lib/python3.8/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    291     kwarg; otherwise ``JSONDecoder`` is used.
    292     """
--> 293     return loads(fp.read(),
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,

AttributeError: 'str' object has no attribute 'read'
  

2019-11-01.json

 {
     "id":"business/2019/nov/01/google-snaps-up-fitbit-for-21bn",
     "type":"article",
     "sectionId":"business",
     "sectionName":"Business",
     "webPublicationDate":"2019-11-01T14:26:19Z",
     "webTitle":"Google snaps up Fitbit for $2.1bn",
     "webUrl":"https://www.theguardian.com/business/2019/nov/01/google-snaps-up-fitbit-for-21bn",
     "apiUrl":"https://content.guardianapis.com/business/2019/nov/01/google-snaps-up-fitbit-for-21bn",
     "fields":{
        "headline":"Google snaps up Fitbit for $2.1bn",
        "standfirst":"<p>Takeover allows web giant to take on Apple in fast-growing smartwatch and wearables business</p>",
        "trailText":"Takeover allows web giant to take on Apple in fast-growing smartwatch and wearables business",
        "byline":"Kalyeena Makortoff",
        "main":"<figure class="element element-image" data-media-id="fc8abb0f70105fcab3aee86dea6c89e211337660"> <img src="https://media.guim.co.uk/fc8abb0f70105fcab3aee86dea6c89e211337660/0_158_3571_2143/1000.jpg" alt="The wireless activity tracker Zip by Fitbit Inc" width="1000" height="600" class="gu-image" /> <figcaption> <span class="element-image__caption">The wireless activity tracker Zip by Fitbit Inc. Google has confirmed it will buy Fitbit for $2.1bn.</span> <span class="element-image__credit">Photograph: Franck Robichon/EPA</span> </figcaption> </figure>",
        "body":"<p>Google has snapped up the Fitbit... ",
        "newspaperPageNumber":"38",
        "wordcount":"679",
        "firstPublicationDate":"2019-11-01T14:25:58Z",
        "isInappropriateForSponsorship":"false",
        "isPremoderated":"false",
        "lastModified":"2019-11-01T18:56:38Z",
        "newspaperEditionDate":"2019-11-02T00:00:00Z",
        "productionOffice":"UK",
        "publication":"The Guardian",
        "shortUrl":"https://gu.com/p/cjeze",
        "shouldHideAdverts":"false",
        "showInRelatedContent":"true",
        "thumbnail":"https://media.guim.co.uk/fc8abb0f70105fcab3aee86dea6c89e211337660/0_158_3571_2143/500.jpg",
        "legallySensitive":"false",
        "lang":"en",
        "isLive":"true",
        "bodyText":"Google has snapped up the Fitbit activity tracker business in a $2.1bn (u00a31.6bn) deal that will enable the search giant to go toe-to-toe with Apple in the fast-growing smartwatch and wearables business..." ,
        "charCount":"4149",
        "shouldHideReaderRevenue":"false",
        "showAffiliateLinks":"false",
        "bylineHtml":"<a href="profile/kalyeena-makortoff">Kalyeena Makortoff</a>"
     },
     "isHosted":false,
     "pillarId":"pillar/news",
     "pillarName":"News"
  },
  

Комментарии:

1. Вопрос на самом деле не имеет ничего общего с machine-learning или sentiment-analysis — пожалуйста, не спамите нерелевантные теги (удалены).

Ответ №1:

Если вы читаете json из файла, вам следует использовать json.load вместо json.loads . json.loads предназначен для чтения JSON из строки. См. json.загрузите документацию.

Например:

 import json

with open('ts.json', 'r') as f:
    content = json.load(f)

print(content)