1 ( 2 ) Quasi-Single Field Inflation with Large Mass
2 ( 2 ) Odd-dimensional de Sitter Space is Transparent
3 ( 2 ) Chain Inflation Reconsidered
4 ( 8 ) Angular 21 cm Power Spectrum of a Scaling Distribution of Co
5 ( 111 ) Dark Energy
... ...
39 ( 12 ) Inflation with High Derivative Couplings
Papers with citations: 39 ; Citations: 1083
110 => 111 in Dark Energy
Save citations (default yes, type n otherwise)?
Citations saved.
Note added: The below code runs in python3. As a kind tester told me, it is not well behaved in python2. To have it work, change saveQ = input(...) to saveQ = raw_input(...). Also optionally you had better remove the "()" following "print" to get the correct output format in python2.
The mechanism is to run a shell warper to download and save inspire webpage to file. Then use python to parse it. Here is the code in case you are interested in. Unfortunately there are a few hardcore path names -- I am just too lazy to change them ^_^
Shell script:
#!/bin/bash LOCALPWD=/home/wangyi/Dropbox/local/check_citation wget "https://inspirehep.net/search?ln=en&ln=en&p=author%3AY.Wang.39&of=hb&action_search=Search&sf=&so=d&rm=&rg=100&sc=0" -O $LOCALPWD/inspires.html python $LOCALPWD/cc.py $LOCALPWD/inspires.html
Python code:
import re import sys import pickle def cut_page(page): paras = re.findall(r'<!C-START REC 11.Brief--!>.+?<abbr class="unapi-id"', page, re.DOTALL) last = re.findall(r'.+(<!C-START REC 11.Brief--!>.+)', page, re.DOTALL)[0] paras.append(last) return paras def get_citations(page): paragraphs = cut_page(page) citations = []; for para in paragraphs: re_match = re.findall(r'<a class = "titlelink" href=".+?">(.+?)\.<.+?<br/>e-Print: <b>(.+?)</b>.+?Cited by ([0-9]+?) record', para, re.DOTALL) if re_match != []: citations.append(re_match[0]) return citations def sum_citations(citations): sum = 0 for item in citations: sum = sum + eval(item[2]) return sum def get_htm_string(fn): try: htmfile = open(fn) except IOError: code_exit("Error: input file " + htm_file_name + " not found. Abort.") htm_string = htmfile.read() htmfile.close() return htm_string def save_citations(citations, last_citations): saveQ = input("Save citations (default yes, type n otherwise)? ") if (saveQ != 'n' and saveQ != 'N'): db_file = open('/home/wangyi/Dropbox/local/check_citation/citations.dat','wb') pickle.dump(citations,db_file) db_file = open('/home/wangyi/Dropbox/local/check_citation/last_citations.dat','wb') pickle.dump(last_citations,db_file) print ('Citations saved.') return print ('Citations not saved.') def load_last_citations(): try: db_file = open( '/home/wangyi/Dropbox/local/check_citation/citations.dat' ,'rb') except IOError: return [] return pickle.load(db_file) def compare(citations, last_citations): newsQ = False for item in citations: match = False for last_item in last_citations: if item[1] == last_item[1]: match = True break if match == False: print("New paper: ", item[0]) newsQ = True continue if item[2] != last_item[2]: print (last_item[2]," => ", item[2]," in ", item[0][:60]) newsQ = True return newsQ def print_stat(citations): for n in range(len(citations)): print (n+1, "(", citations[n][2],") ",citations[n][0][:60]) print() print("Papers with citations: ", len(citations), "; Citations: ", sum_citations(citations)) print() ##### main ##### htm_string = get_htm_string(sys.argv[1]) citations = get_citations(htm_string) print_stat(citations) last_citations = load_last_citations() newsQ = compare(citations, last_citations) if newsQ == True: save_citations(citations, last_citations)
No comments:
Post a Comment