python - How to add data to a dict-like Item Field using ItemLoaders? -
i'm using scrapy's xpathitemloader, it's api documents adding values item field, not deeper :( mean:
def parse_item(self, response): loader = xpathitemloader(response=response) loader.add_xpath('name', '//h1')
will add values found xpath item.name
, how add them item.profile['name']
?
xpathitemloader.add_xpath
doesn't support writing nested fields. should construct profile
dict manually , write via add_value
method (in case still need go loaders). or, can write own custom loader.
here's example using add_value
:
from scrapy.contrib.loader import xpathitemloader scrapy.item import item, field scrapy.selector import htmlxpathselector scrapy.spider import basespider class testitem(item): others = field() class wikispider(basespider): name = "wiki" allowed_domains = ["en.wikipedia.org"] start_urls = ["http://en.wikipedia.org/wiki/main_page"] def parse(self, response): hxs = htmlxpathselector(response) loader = xpathitemloader(item=testitem(), response=response) others = {} crawled_items = hxs.select('//div[@id="mp-other"]/ul/li/b/a') item in crawled_items: href = item.select('@href').extract()[0] name = item.select('text()').extract()[0] others[name] = href loader.add_value('others', others) return loader.load_item()
run via: scrapy runspider <script_name> --output test.json
.
the spider collects items of other areas of wikipedia
main wikipedia page , writes dictionary field others
.
hope helps.
Comments
Post a Comment