How to Migrate to the iGEM-Wiki: The Freiburg Way
At the end of an iGEM summer writing the final wiki in time for wiki freeze is a challenge every iGEM team experiences. But apart from designing the wiki itself, using CSS and Javascript, a main problem is the content. A final iGEM wiki should on the one hand be good to read and inspire people for science, on the other hand all the information needed to repeat the experiments has to be provided for good scientific practice. And especially for this last aspect we would like to show how we worked on our wiki and what may be adaptable for future teams.
The Internal Wiki
To start early and to document every step we did, we worked with a so called internal wiki that was based on dokuwiki and hosted on the university's web server. A special advantage of this wiki-system was an introductory course provided by the university and the relatively simple formatting syntax that allowed direct work without long learning periods. Additionally the wiki is used to organize our team, to protocol meetings and to provide information such as scientific papers and protocols as well as cooking recipes that we tested during these meetings. But a major drawback from using dokuwiki was that the syntax is not at all compatible with the mediawiki syntax that is used on the external official iGEM wiki. To not rewrite all articels we thought for a solution to convert content between these wikis in a fast and simple way.
Migration
Form providing input for the wikitranslate script
Even though the dokuwiki provides a feature to export the page's body content (by appending &do=export_xhtmlbody to the URL and getting the source code of this via STRG+U in Firefox) all the image links refer to the internal server and are, therefore, broken. The first step in migrating our content, thus was a script to download all the images of a specified internal wiki page and prefix the file names with Freiburg_ to avoid name conflicts with other teams. The program also outputs a file with all the file names separated by commas that is later used to upload the files.
To transfer the downloaded and renamed files onto the iGEM server the selenium browser automation tool (http://www.seleniumhq.org/) in combination with the sideflow-plugin (https://github.com/73rhodes/sideflow) was used. This plugin allows for the creation of loops in selenium routines as needed to upload a list of files. The steps in the selenium routine are basically the coded equivalent to clicking on the upload-button, selecting a file, clicking the ignore any warnings button and submit the form.
Once the files are on the server, it is necessary to get their location on this server to use it as image source in the html-text on the sites. As the iGEM Server assigns a variable path to every file (as /e/e4/filename.png) this path has to be probed for every single image to be used on the site.
This routine is implemented in the wikitranslate script together with a routine for replacing the linked headers by unlinked ones and one for adding preceding and following text, such that, for example custom styling for a specific page can be done directly in the protocol. The script is used on a web server basis (as cgi - script) and thus can be used by every team member with no needs for programming skills. It then returns an html-file that contains the source code to paste in the manually created new external wiki page.
Refining the External Wiki
Despite lowering a lot the workload needed for transferring our content to the external wiki, there still is need to adapt the design by hand. A major aspect here is a consistent design of the uploaded pages with the ones created especially for the external wiki, as the landing page or the project overview pages.
All in all, using an automated uploading and replacing procedure to transfer our content from the internal to the external wiki allowed us to spend more time on the remaining pages, like our poster and the presentation. And we hope that our experiences might also be useful for other teams writing their content in advance.
Script-Code
dwfileload.py:
#!/usr/bin/env python
from bs4 import BeautifulSoup # parse raw html and extract elements
import requests
from requests.auth import HTTPBasicAuth
import cgi
import urllib3
##################
# DEFINITIONS
##################
def getdwsource(dwsite):
# get the subsite of the internal wiki specified as site
# use the dummy usre to provide access to the wiki
dokuwiki=requests.get('https://wiki.uni-freiburg.de/igem2015/doku.php?id=%s&do=export_xhtmlbody'%dwsite, auth=HTTPBasicAuth('dummy', 'igem2015'))
# extract the html of the requests-object by using beautifulsoup
soup=BeautifulSoup(dokuwiki.text, 'html.parser')
# print('https://wiki.uni-freiburg.de/igem2015/doku.php?id=%s&do=export_xhtmlbody'%site)
return soup
def getdwpicnames(source):
########
# returns a dict of the the names of all images in the source code as keys with the corresponding links to the image and the info-page
########
picnamesdic={}
for img in source.find_all('img'):
# extract the name of all pictures from the src-string
try:
# get the name of the picnamesdic
dwname=img.get('src').split('media=')[1]
# use the name as key for a dict to store the links for src and href
picnamesdic[dwname]=[img.get('src')]
picnamesdic[dwname].append(img.parent.get('href'))
print('+ \t %s '%img.get('src').split('media=')[1])
# print('dwlink=%s'%picnamesdic[dwname])
except:
print('- \t\t %s '%img.get('src').split('/')[-1])
return picnamesdic
def unescape(s):
s = s.replace("<", "<")
s = s.replace(">", ">")
s = s.replace("%3A", ":")
# this has to be last:
s = s.replace("&", "&")
return s
def getpicurl(picname):
# input: The name of a file uploaded on the iGEM 2015 Wiki-Server #
# IMPOARTNT: The picture has to be uploaded before running the script! #
# picname=input('please paste the name of an uploaded iGEM-wiki file:\n')
# correct picname for changes the iGEM-Server needs
picname=picname.replace(':','-' )
# define fix url for Wiki-Sever #
url='https://2015.igem.org/File:Freiburg_%s'%picname
print('the url I looked for was:\n%s' %url)
# get raw_html from url as specified here: http://stackoverflow.com/questions/17257912/how-to-print-raw-html-string-using-urllib3 #
http_pool = urllib3.connection_from_url(url)
r = http_pool.urlopen('GET',url)
raw_html=r.data.decode('utf-8')
# initialise bs-object '
soup = BeautifulSoup(raw_html, 'html.parser')
# find the href-link in an a-object in a div with id=file #
try:
serverlink='https://2015.igem.org'+soup.find(id='file').find('a').get('href')
# return the link #
return '%(x)s, %(y)s' % {'x': picname, 'y' : serverlink}
except:
return None
###################
# BEGIN PROGRAMME
###################
# ask for internal wiki site to read
dwsite=input('dwsite (in please in quotations):\t')
# get sourcecode
dwsource=getdwsource(dwsite)
# get all picture names within a href
picnamesdic=getdwpicnames(dwsource)
# initialize list of urls to download
urldic = {}
# fill the list
for key in picnamesdic:
if getpicurl(key) == None:
urldic[key]='/igem2015/lib/exe/fetch.php?media='+key
# download the images in the current directory (replace non usable syntax and append Freiburg_)
for url in urldic:
r=requests.get('http://wiki.uni-freiburg.de'+urldic[url], auth=HTTPBasicAuth('dummy', 'igem2015'), stream=True)
if r.status_code == 200:
f = open('Freiburg_'+url.replace(':','-'), 'wb')
f.write(r.content)
f.close()
f = open('files.txt', 'a')
for key in urldic.keys():
f.write('"{}"'.format(key.replace(':','-')))
f.write(', ')
f.close()
Presets for Selenium IDE
command |
target |
value |
storeEval |
new Array(YOURARRAY); |
myarray |
getEval |
index=0; |
|
while |
index < storedVars['myarray'].length |
|
open |
/Special:Upload |
|
waitForPageToLoad |
|
|
storeEval |
index |
temp |
echo |
javascript{'/home/USER/CITY_'+storedVars['myarray'][storedVars['temp']]} |
|
type |
id=wpUploadFile |
javascript{'/home/USER/CITY_'+storedVars['myarray'][storedVars['temp']]} |
click |
id=wpIgnoreWarning |
|
clickAndWait |
name=wpUpload |
|
getEval |
index++; |
|
endWhile |
|
|
HTML for Webserver-Input
<form method="get" action="wikitranslate.py">
<div>
<h3>Choose option:</h3>
<input type="checkbox" name="rmheaderlink" value="True"> Remove headerlinks<br>
<input type="checkbox" name="changepicurl" value="True"> Change the picture links<br>
<input type="checkbox" name="appendtextbefore" value="True"> Add preceding text <br>
<input type="checkbox" name="appendtextafter" value="True"> Append text <br>
<input type="checkbox" name="changeprotocols" value="True"> Change protocol links <br>
<input type="checkbox" name="changeregistry" value="True"> Link registry entries <br>
</div>
<h3>Provide information if necessary</h3>
<div>
<label for="dwhtml">Name of an internal wiki-page (e.g. <em>labjournal:cellfree</em>):</label>
<input id="dwhtml" name="dwsite" required></textarea>
</div>
<div>
<label for="textbefore">Text to precede</label>
<textarea id='textbefore' cols="20" rows="5" name="textbefore">
{{Freiburg/CSS}}
<html>
</textarea>
</div>
<div>
<label for="textafter">Text to append</label>
<textarea id="textafter" cols="20" rows="5" name="textafter">
</html>
</div>
<div>
<input type="submit" value="Submit">
</div>
</form>
CGI script code to be run on Webserver
#!/usr/bin/python
import cgi
form = cgi.FieldStorage() # instantiate only once!
dwsite = form.getfirst('dwsite', 'empty')
#dwsite = 'labjournal:ilab'
isrmheaderlinks=form.getfirst('rmheaderlink', False)
ischangepicurl=form.getfirst('changepicurl', False)
appendtextbefore=form.getfirst('appendtextbefore', False)
appendtextafter=form.getfirst('appendtextafter', False)
ischangeprotocols=form.getfirst('changeprotocols', False)
isregistry=form.getfirst('changeregistry', False)
#isrmheaderlinks=True
#ischangepicurl=True
#appendtextbefore=True
#appendtextafter=True
#ischangeprotocols=True
#isregistry=True
#textbefore='some text before'
#textafter='some text after'
textbefore=form.getfirst('textbefore', '')
textafter=form.getfirst('textafter', '')
# Avoid script injection escaping the user input
dwsite = cgi.escape(dwsite)
from bs4 import BeautifulSoup # parse raw html and extract elements
import urllib3 # read html-text from url
import requests
from requests.auth import HTTPBasicAuth
import sys
from cStringIO import StringIO
import csv
#import html
old_stdout = sys.stdout
sys.stdout = mystdout = StringIO()
# read in the dict with internal and external protocol names
protocolsdic = {}
reader = csv.DictReader(open('protocols_names.csv', 'r'))
for row in reader:
protocolsdic[row['internal']]=row['external']
# remove empty dict entries
protocolsdic.pop('', None)
# read in the dict with registry entries to replace
registrydic = {}
reader = csv.DictReader(open('registry_links.csv', 'r'))
for row in reader:
registrydic[row['internal']]='<a href=%(x)s>%(y)s</a>' % {'x' : '"{}"'.format(row['external']), 'y' : row['internal']}
################
# BEGIN DEFINITIONS
################
def getdwsource(dwsite):
# get the subsite of the internal wiki specified as site
# use the dummy usre to provide access to the wiki
dokuwiki=requests.get('https://wiki.uni-freiburg.de/igem2015/doku.php?id=%s&do=export_xhtmlbody'%dwsite, auth=HTTPBasicAuth('dummy', 'igem2015'))
# extract the html of the requests-object by using beautifulsoup
soup=BeautifulSoup(dokuwiki.text, 'html.parser')
# print('https://wiki.uni-freiburg.de/igem2015/doku.php?id=%s&do=export_xhtmlbody'%site)
return soup
def sfah(source):
# return the soup-objects of all headers
return source.findAll('h1') + source.findAll('h2') + source.findAll('h3') + source.findAll('h4') + source.findAll('h5') + source.findAll('h6')
def rmheaderlinks(soup):
########
# removes the a-href links from the headers of an internal dw-file
########
rmheaderlinksdic={}
for header in sfah(soup):
rmheaderlinksdic[unicode(header.a)]=header.a.text
return rmheaderlinksdic
def getdwpicnames(source):
########
# returns a dict of the the names of all images in the source code as keys with the corresponding links to the image and the info-page
########
picnamesdic={}
for img in source.find_all('img'):
# extract the name of all pictures from the src-string
try:
# get the name of the picnamesdic
dwname=img.get('src').split('media=')[1]
# use the name as key for a dict to store the links for src and href
picnamesdic[dwname]=[img.get('src')]
picnamesdic[dwname].append(img.parent.get('href'))
print('+ \t %s '%img.get('src').split('media=')[1])
# print('dwlink=%s'%picnamesdic[dwname])
except:
print('- \t\t %s '%img.get('src').split('/')[-1])
return picnamesdic
def getpicurl(picname):
# input: The name of a file uploaded on the iGEM 2015 Wiki-Server #
# IMPORTANT: The picture has to be uploaded before running the script! #
# picname=input('please paste the name of an uploaded iGEM-wiki file:\n')
# correct picname for changes the iGEM-Server needs
picname=picname.replace(':','-' )
# define fix url for Wiki-Sever #
url='https://2015.igem.org/File:Freiburg_%s'%picname
#print('the url I looked for was:\n%s' %url)
# get raw_html from url as specified here: http://stackoverflow.com/questions/17257912/how-to-print-raw-html-string-using-urllib3 #
http_pool = urllib3.connection_from_url(url)
r = http_pool.urlopen('GET',url)
raw_html=r.data.decode('utf-8')
# initialise bs-object '
soup = BeautifulSoup(raw_html, 'html.parser')
# find the href-link in an a-object in a div with id=file #
try:
serverlink='https://2015.igem.org'+soup.find(id='file').find('a').get('href')
# return the link #
return serverlink
except:
return None
def unescape(s):
s = s.replace("<", "<")
s = s.replace(">", ">")
# this has to be last:
s = s.replace("&", "&")
return s
def changeprotocols(soup):
returndic = {}
for link in soup.findAll('a'):
linksource = link.get('href')
for name in protocolsdic:
if linksource != None:
if unicode(name) in linksource :
print(unicode(name))
# generate pairs for replacement using the absolute path for the iGEM server protocols section
returndic[linksource] =unicode('https://2015.igem.org/Team:Freiburg/Protocols/')+protocolsdic[name]
return returndic
##########################
# BEGIN PROGRAMME
##########################
# set the subprogramcounter
prog_count = 0
# get the source code
dwsource=getdwsource(dwsite)
# convert it to replaceable text
exthtml=unicode(dwsource)
# initialize dic to replace elements
rpdic={}
### is rmheaderlinks ###
if isrmheaderlinks:
# compute dic to replace headerlinks
rpdic.update(rmheaderlinks(dwsource))
prog_count+=1
### is changeprotocols ###
if ischangeprotocols:
rpdic.update(changeprotocols(dwsource))
prog_count+=1
### is changepicurl ###
missingimage = False
if ischangepicurl:
picnamesdic=getdwpicnames(dwsource)
for key in picnamesdic:
serverlink=getpicurl(key)
if serverlink != None:
rpdic.update({cgi.escape(picnamesdic[key][0]):serverlink})
else:
missingimage = True
if picnamesdic[key][1]:
rpdic.update({cgi.escape(picnamesdic[key][1]):serverlink})
prog_count+=1
### is registry ###
if isregistry:
rpdic.update(registrydic)
prog_count+=1
### cancel output if no program was called ###
if prog_count == 0:
sys.exit(0)
### replacing ###
exthtmlold = exthtml
for text in rpdic.keys():
# exthtml = exthtml.replace(cgi.escape(text),unescape(rpdic[text]))
exthtml = exthtml.replace(text,rpdic[text])
sys.stdout=old_stdout
if not missingimage:
print "Content-Disposition: attachment; filename=\"%s.html\"\r\n\n"%dwsite
if appendtextbefore:
print(textbefore.encode('utf8'))
print(exthtml.encode('utf8'))
if appendtextafter:
print(textafter.encode('utf8'))
else:
print "Content-type: text/html \n"
print('There is an image missing!!')