index
:
urls
master
Web crawler in Go.
Brandon Irizarry
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
8 days
feat: check for missing https://
demo
8 days
feat: add header to xml output
demo
8 days
wip: generate rough draft of sitemap
demo
8 days
docs: expand findURLs godoc
demo
8 days
docs: add comment explaining purpose of log.Lshortfile
demo
8 days
refactor: move html document creation to getBatch
demo
8 days
chore: include shortfile printout in log invocations
demo
9 days
fix: make select statement block unless communication takes place
demo
9 days
feat: configure maxDepth from the command line
demo
9 days
wip: prototype a max-depth limitation
demo
9 days
feat: update the classic crawler to track depth via packets
demo
9 days
refactor: move packet definitions to their own file
demo
9 days
refactor: move "packet conversion" into a separate function
demo
10 days
docs: add extensive comments
demo
10 days
feat: measure the depth where each URL is found
demo
10 days
feat: add some prints to prove we need to select on Done()
demo
10 days
refactor: eliminate redundant select statement
demo
10 days
fix: make sure all workers terminate by the end
demo
10 days
feat: add early termination condition based on maxURLs
demo
10 days
feat: add break condition from worklist loop
demo
10 days
feat: add the worker-pool-based crawer from TGPL
demo
10 days
docs: add an "Awesome Go" section to the README
demo
10 days
docs: save websites I usually use with this crawler
demo
10 days
fix: release semaphore at the proper time
demo
10 days
feat: implement maxConcurrency using a buffered channel 'sema'
demo
10 days
feat: add cancellation feature
demo
10 days
feat: hit 'em with the classic web crawler
demo
10 days
refactor: make deduplication part of main goroutine
demo
10 days
feat: restore original "print 45 and hang" behavior
demo
10 days
feat: change all channel payloads to pointer types
demo
10 days
feat: reveal bug in the channel linkage topology
demo
10 days
feat: add some code to cancel
demo
10 days
refactor: use SplitSeq instead of Split
demo
10 days
feat: add gouroutine-leak profiling
demo
10 days
feat: design worker-pool webcrawler
demo
10 days
docs: remove comment
demo
12 days
refactor: move main logic into separate function
demo
13 days
feat: avoid sending empty URL slices to the worklist
demo
13 days
feat: add diagnostics to prove code is buggy
demo
13 days
docs: add package godoc
demo
13 days
docs: expound help string for -max argument
demo
13 days
fix: check *maxURLs > 0 case
demo
13 days
feat: add maxURLs CLI flag
demo
13 days
feat: implement cancellation
demo
13 days
feat: add semaphore to throttle concurrent GET requests
demo
13 days
refactor: remove intermediate variable
demo
13 days
docs: add comment for clarity
demo
13 days
feat: implement simple BFS webcrawler
demo
14 days
feat: create empty main.go
demo
2026-05-21
chore: add existing code to project
demo
[next]