index
:
urls
master
Web crawler in Go.
Brandon Irizarry
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
6 days
feat: upgrade hamlet and use newer stuff from there
master
demo
6 days
feat: use hamlet package to simplify command-line arguments
demo
6 days
chore: install hamlet
demo
8 days
feat: specify whether URL sources are both missing or both present
demo
8 days
feat: implement shortcode feature
demo
8 days
refactor: move initial URL parsing into function 'convertToURL'
demo
8 days
chore: add urls.csv shortcodes file
demo
8 days
docs: add some comments
demo
8 days
feat: add a user agent header!
demo
8 days
feat: add "total entries" as part of XML comment
demo
8 days
feat: prettify stats when figure was passed in as 0
demo
8 days
feat: place comment before URL listing
demo
8 days
feat: add comment logging maxDepth and maxURLs inside xml output
demo
8 days
chore: ignore xml output
demo
8 days
feat: save sitemap to a file
demo
8 days
feat: check for missing https://
demo
8 days
feat: add header to xml output
demo
8 days
wip: generate rough draft of sitemap
demo
8 days
docs: expand findURLs godoc
demo
8 days
docs: add comment explaining purpose of log.Lshortfile
demo
8 days
refactor: move html document creation to getBatch
demo
8 days
chore: include shortfile printout in log invocations
demo
9 days
fix: make select statement block unless communication takes place
demo
9 days
feat: configure maxDepth from the command line
demo
9 days
wip: prototype a max-depth limitation
demo
9 days
feat: update the classic crawler to track depth via packets
demo
9 days
refactor: move packet definitions to their own file
demo
9 days
refactor: move "packet conversion" into a separate function
demo
10 days
docs: add extensive comments
demo
10 days
feat: measure the depth where each URL is found
demo
10 days
feat: add some prints to prove we need to select on Done()
demo
10 days
refactor: eliminate redundant select statement
demo
10 days
fix: make sure all workers terminate by the end
demo
10 days
feat: add early termination condition based on maxURLs
demo
10 days
feat: add break condition from worklist loop
demo
10 days
feat: add the worker-pool-based crawer from TGPL
demo
10 days
docs: add an "Awesome Go" section to the README
demo
10 days
docs: save websites I usually use with this crawler
demo
10 days
fix: release semaphore at the proper time
demo
10 days
feat: implement maxConcurrency using a buffered channel 'sema'
demo
10 days
feat: add cancellation feature
demo
10 days
feat: hit 'em with the classic web crawler
demo
10 days
refactor: make deduplication part of main goroutine
demo
10 days
feat: restore original "print 45 and hang" behavior
demo
10 days
feat: change all channel payloads to pointer types
demo
10 days
feat: reveal bug in the channel linkage topology
demo
10 days
feat: add some code to cancel
demo
10 days
refactor: use SplitSeq instead of Split
demo
10 days
feat: add gouroutine-leak profiling
demo
10 days
feat: design worker-pool webcrawler
demo
[next]