首页 > python > Python JSON按日期键过滤并写入新的JSON文件

Python JSON按日期键过滤并写入新的JSON文件 (Python JSON filter by date key and write to new JSON file)

问题

我有一个Python脚本,它将读取一些JSON文件,然后将它们导入MongoDB。

我希望它只插入Published密钥为1个月或更短的记录。

我目前的代码是: -

import json
import logging
import logging.handlers
import os
import pymongo
from pymongo import MongoClient


def import_json(mongo_server,mongo_port, vuln_folder):
    try:
        logging.info('Connecting to MongoDB')
        client = MongoClient(mongo_server, mongo_port)
        db = client['vuln_sets']
        coll = db['vulnerabilities']
        logging.info('Connected to MongoDB')
        basepath = os.path.dirname(__file__)
        filepath = os.path.abspath(os.path.join(basepath, ".."))
        archive_filepath = filepath + vuln_folder
        filedir = os.chdir(archive_filepath)
        file_count = 0
        for item in os.listdir(filedir):
            if item.endswith('.json'):
                file_name = os.path.abspath(item)
                with open(item, 'r') as currentfile:
                    vuln_counter = 0
                    duplicate_count = 0
                    logging.info('Currently processing ' + item)
                    file_count +=1
                    json_data = currentfile.read()
                    vuln_content = json.loads(json_data)
                    for vuln in vuln_content:
                        try:
                            del vuln['_type']
                            new_vuln = {key: vuln[key] for key in vuln if key != '_source'}
                            new_vuln.update(vuln['_source'])
                            coll.insert(new_vuln, continue_on_error=True)
                            vuln_counter +=1
                        except pymongo.errors.DuplicateKeyError:
                            duplicate_count +=1

                logging.info('Added ' + str(vuln_counter) + ' vulnerabilities for ' + item)
                logging.info('Found ' + str(duplicate_count) + ' duplicate records!')
                os.remove(file_name)
        logging.info('Processed ' + str(file_count) + ' files')
    except Exception as e:
        logging.exception(e)

我在想我可以做一个IF语句(伪代码!):

filter_vuln = if vuln.published = datetime.now -1:
              coll.insert(filter_vuln)

我猜它会丢弃任何不匹配该模式的记录?

JSON看起来像这样:

[
  {
    "_index": "bulletins",
    "_type": "bulletin",
    "_id": "OPENWRT-SA-000001",
    "_score": null,
      "lastseen": "2016-09-26T15:45:23",
      "references": 
      "affectedPackage": [
        {
          "OS": "OpenWrt",
          "OSVersion": "15.05",
          "packageVersion": "9.9.8-P3-1",
          "packageFilename": "UNKNOWN",
          "arch": "all",
          "packageName": "bind",
          "operator": "lt"
        }
      ],
      "edition": 1,
      "description": "Some Description",
      "reporter": "OpenWrt Project",
      "published": "2016-01-24T13:33:41",
      "modified": "2016-01-24T13:33:41",
  },

为简洁起见,已从上述JSON中删除了一些数据,因为实际记录很长,这是较短的数据之一!

解决方法

我猜你在上个月说你的意思是过去30天,你需要timedelta这个例子。

从datetime import timedelta,datetime

today = datetime.now()

lastmonth =今天 - timedelta(天= 30)

tests = ['2017-11-21','2017-10-20']

在测试中的日期:

if date >= str(lastmonth):
    print(date)
else:
    pass

其结果是:2017-11-21

这只是关于如何按日期过滤的示例

问题

I have a Python script that will read through some JSON files and then import these to MongoDB.

I want it to only insert records that have the Published key 1 month or less.

My current code is:-

import json
import logging
import logging.handlers
import os
import pymongo
from pymongo import MongoClient


def import_json(mongo_server,mongo_port, vuln_folder):
    try:
        logging.info('Connecting to MongoDB')
        client = MongoClient(mongo_server, mongo_port)
        db = client['vuln_sets']
        coll = db['vulnerabilities']
        logging.info('Connected to MongoDB')
        basepath = os.path.dirname(__file__)
        filepath = os.path.abspath(os.path.join(basepath, ".."))
        archive_filepath = filepath + vuln_folder
        filedir = os.chdir(archive_filepath)
        file_count = 0
        for item in os.listdir(filedir):
            if item.endswith('.json'):
                file_name = os.path.abspath(item)
                with open(item, 'r') as currentfile:
                    vuln_counter = 0
                    duplicate_count = 0
                    logging.info('Currently processing ' + item)
                    file_count +=1
                    json_data = currentfile.read()
                    vuln_content = json.loads(json_data)
                    for vuln in vuln_content:
                        try:
                            del vuln['_type']
                            new_vuln = {key: vuln[key] for key in vuln if key != '_source'}
                            new_vuln.update(vuln['_source'])
                            coll.insert(new_vuln, continue_on_error=True)
                            vuln_counter +=1
                        except pymongo.errors.DuplicateKeyError:
                            duplicate_count +=1

                logging.info('Added ' + str(vuln_counter) + ' vulnerabilities for ' + item)
                logging.info('Found ' + str(duplicate_count) + ' duplicate records!')
                os.remove(file_name)
        logging.info('Processed ' + str(file_count) + ' files')
    except Exception as e:
        logging.exception(e)

I am thinking that I could do either an IF statement (Pseudo code!):

filter_vuln = if vuln.published = datetime.now -1:
              coll.insert(filter_vuln)

Which I am guessing it would drop any records not matching that pattern?

The JSON looks like this:

[
  {
    "_index": "bulletins",
    "_type": "bulletin",
    "_id": "OPENWRT-SA-000001",
    "_score": null,
      "lastseen": "2016-09-26T15:45:23",
      "references": 
      "affectedPackage": [
        {
          "OS": "OpenWrt",
          "OSVersion": "15.05",
          "packageVersion": "9.9.8-P3-1",
          "packageFilename": "UNKNOWN",
          "arch": "all",
          "packageName": "bind",
          "operator": "lt"
        }
      ],
      "edition": 1,
      "description": "Some Description",
      "reporter": "OpenWrt Project",
      "published": "2016-01-24T13:33:41",
      "modified": "2016-01-24T13:33:41",
  },

Some data has been removed from the above JSON for brevity as the actual record is quite long, and this is one of the shorter ones!

解决方法

I'm guessing that when you say within the last month you mean the last 30 days, you would need timedelta for this example.

from datetime import timedelta, datetime

today = datetime.now()

lastmonth = today - timedelta(days=30)

tests = ['2017-11-21','2017-10-20']

for date in tests:

if date >= str(lastmonth):
    print(date)
else:
    pass

The result is : 2017-11-21

That's just an example on how to filter by date

相似信息