Documentation on biblio-py

Release:0.6.0
Date:May 05, 2011

Overview:

Quick-start with Biblio-py

Introduction

Biblio-py is a package written in python used to manage bibliography. Currently it contains two subpackages:

  • yapbib: Yet Another Python BIBliography manager tool, mainly for BibTeX files but able to export to html, latex lists and xml and print to ad-hoc formats.
  • query_ads: A simple python tool that permits to query Harvard’s Database

Features

The most important features are

  • It is Open source Software (GNU General Public License)
  • Parsing of BibTeX and ADS portable format
  • Built in conversion to BibTex, LaTeX, HTML and XML formats.
  • Native Format for store uses standard python pickle format
  • Convert accents and some non-english characters to XML entities.
  • Handles correctly some math constructs, currently superscripts and subscripts. It should not be very difficult to add more.
  • Include scripts to manipulate databases, get bibliography from ADS online database and extract citations from a LaTeX paper
  • You can import the module from your script and perform Python-powered manipulation on the data.
  • Object oriented model

Usage

The package currently includes three scripts:

bibmanage.py
script used to search, extract, sort and export bibliography items from databases
get_papers.py
Query Harvard’s database
bibextract.py
Generates a restricted bibliography including only the items cited in a LaTeX article. Also allows to clean-up the entries.

To get help with any of the scripts just type:

$ <scriptname> -h
$ <scriptname> --help

Create your own scripts

It is easy to use the package for your own custom scripts. The scripts that are shipped with the package are good examples to learn to work with it. They are small, but full-fledged examples.


Some examples of use:

Examples of use

bibmanage examples

  • Example: Import the BibTeX file myrefs.bib and save them to the database myrefs.dmp:

    $ bibmanage.py myrefs.bib -d myrefs.dmp
  • Example: Merge two bibtex files, create a new bibtex file and also dump to database.dmp

    • If we want to keep the original keys on the BibTeX files we have to explicitly use:

      $ bibmanage.py myrefs1.bib  myrefs1.bib -f bib -o bib allrefs.bib -o database.dump  --keep-keys
    • The following line will uniformize the keys from the data in the items:

      $ bibmanage.py myrefs1.bib  myrefs1.bib -f bib -o allrefs.bib -o database.dump

      Observe that this is the default behaviour and will create a (hopefully) unique key using a simple algorithm. For articles it uses:

      First seven characters of name + year + journal abbreviation + p + page

  • Example: Select only those items with some substring in the key (output to stdout) from a bz2-comprised database:

    $ bibmanage.py myrefs.bib.bz2 -s substring:key -o -
  • Example: Select only those items whose author is (among others) autor1,B. and print them in HTML format to file autor1.html use one of the (short or long options are equivalent):

    $ bibmanage.py myrefs.dmp -s autor1,B:author -o autor1.html -f html
    $ bibmanage.py myrefs.dmp --search=autor1,B:author --output=autor1.html --format=html
  • The last example may be repeated using as source the original BibTeX file, but the parsing is slower:

    $ bibmanage.py myrefs.bib -s autor1,B:author -o autor1.html

    Note that the –format option is redundant if the output is to a file with the right extension

  • Example: Select only those items whose author is (among others) autor1,B. but not autor2,C and print them in LaTeX format to file ejemplo.tex:

    $ bibmanage.py myrefs.dmp -s autor1,B:author -x autor2,C:author -o ejemplo.tex
  • Restrict the last example to publications between years 2004 and 2006:

    $ bibmanage.py myrefs.dmp -x autor1,B:author -x autor2,C:author --start-year=2004 \
      --end-year=2006 -o ejemplo.tex
  • Working with pipes:

    $ bibmanage -s LastName1:author biblio1.bib -f bib -o - | bibmanage \
      -s LastName2:author biblio2.dmp - -o biblio.html

Will get the items with LastName1 as author from biblio1.bib and the results are taken as input to merge with items by LastName2 from database biblio2.dmp. The output is in html format to the file biblio.html

get_papers examples

  • Example: Get all the papers in the database for one author in a given year:

    $ get_papers.py --author=Einstein,A. --year=1905
you will get BibTeX entries in the standard output. There are options that change the behavior.
  • Example: Use of other options:

    $ get_papers.py --author=Einstein,A. --year=1905 -f html -o biblio.html
    $ get_papers.py --author=Einstein,A. --year=1905 -f latex --output=biblio.tex
    $ get_papers.py --author=Einstein,A. --year=1905 -f latex --sort=date,reverse

bibextract examples

Simple Example:

The simplest use of bibextract is to get the bibliography cited in document.tex (directly or from document.aux). Use one of the following:

$ bibextract document.tex
$ bibextract document.aux

Here the bibtex items are read from a default database $BIBDB. This will create a file: document.bib with only the cited items.

Scripting using the library

Simple script

The simplest useful script to custom-convert your database to latex could be something like::

import yapbib.biblist as biblist
#
# Change here to your files
bibfile= yourbib.bib  # input database
outputfile=myfile.tex # output latex file
# latexstyle, overrides default values
latexstyle={ 'url': None, # Do not include url
             'doi': None, # Do not include doi
             'author': (r'\textbf{',r'}'), # Write the authors in boldface
}

b=biblist.BibList()
b.import_bibtex(bibfile)
# Sort them in your specified order and export them to latex list
b.sort(['year','firstpage','author','journal'],reverse=True)
b.export_latex(outputfile,style=latexstyle)

Some extra manipulation

You can also manipulate the data prior to convert it (though bibmanage.py already does it):

import yapbib.biblist as biblist
#
# Change here to your files
bibfile= yourbib.bib  # input database
outputfile=myfile.tex # output latex file
# latexstyle, overrides default values
latexstyle={ 'url': None, # Do not include url
             'doi': None, # Do not include doi
             'author': (r'\textbf{',r'}'), # Write the authors in boldface
}

b=biblist.BibList()
b.import_bibtex(bibfile)
# Select only some items
items= b.search(findstr='name1',fields=['author','key'])

# Create a reduced database
bout= biblist.BibList()
for it in items:
  bout.add_item(b.get_item(it),it)

# Sort them in your specified order and export them to latex list
bout.sort(['year','firstpage','author','journal'],reverse=True)
bout.export_latex(outputfile,style=latexstyle)

A Thesis list

The list of thesis performed at our lab was created with the following script:

import yapbib.biblist as biblist

bibfile= 'tesis.bib'
outputfile= 'tesis.html'

htmlstyle={'fields':['author','title','director','school','year'],
           'author': ('<span class="authors">', '</span><BR>'),
           'director':('<BR><span class="director">','</span>. ')}

css_style=""".title a,
.title {font-weight: bold;	color :    #416DFF; }
ol.bibliography li{	nmargin-bottom:0.5em;}
.year:before {content:" (";}
.year:after {content:").";}
.authors {font-weight:bold; display:list;}
.authors:after {content:". ";}
.director:before{content:"Director: ";}
"""

head='''
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css">
{0}</style>
<title>Tesis Doctorales</title>
</head>
<body>
<h2>Tesis Doctorales (PhD Thesis)</h2>
<ol class="bibliography">
'''.format(css_style)

b=biblist.BibList()
b.import_bibtex(bibfile)
b.sort(['year','author','reverse'])
b.export_html(outputfile, head= head, style= htmlstyle, separate_css=False)

Exploring the package interactively

>>> import yapbib.biblist as biblist
>>> b=biblist.BibList()
>>> b.import_bibtex('mybib.bib')
>>> items= b.List() # Shows the keys of all entries
>>> items
['KEY1','KEY2']
>>> it= b.get_item(items[0]) # Get first item
>>> it= b.get_items()[0]  # (Alternative) to get first item
>>> it.get_fields() # Show all fields for item
>>> it.preview()    # Show a preview (brief info)
>>> bib= it.to_bibtex() # get item in BibTeX form
>>> tex= it.to_latex() # get item in LaTeX form
>>> html= it.to_html() # get item in html form
>>> print it  # print full information on the item
>>> print unicode(it) # Use this if it has non-ascii characters