› Forums › Archived Forums › Old General Forum (Busted) › User Stats?
- This topic has 14 replies, 7 voices, and was last updated 22 years ago by
houseofbrew.
-
AuthorPosts
-
01/14/2004 at 6:47 am #1721134
Maybe someone can help me out. Has anyone in the WGA attempted an automated way to gather stats for WI geocachers? I’ve been pulling some long nights finishing up some scripts, when I realized that someone may already have a quick way to get these stats.
If not, I’ll keep coding for the next few nights. (I’ve got a command line tool now that gets details on caches found by username…added the database…but I still have a fairly large TODO list).
01/14/2004 at 12:07 pm #1746167I hate to dim your hopes and hope you are more successful than the others before you who have tried and failed.
Groundspeak has protected the site rather well from people who try automated gathering of stats. IP blocking and other methods of prevention are in place. Part of the reason is that the tactics used in the past required too much of the server. We have all noticed the server is already overloaded each weekend.
If you want to compare manually entered stats for WI cachers, Cheesehead Dave has a good site HERE.
There is also a national database HERE, but again, it is manually entered information provided by each individual cacher.
I wish you luck!
01/14/2004 at 12:15 pm #1746168Yep, I had a PHP script that ran under cron to collect find data. It lasted for about two days before it was IP banned. I wish you luck in trying, but don’t be suprised if you end up the same way…
01/14/2004 at 3:25 pm #1746169Well that sure stinks. Just when I get the script working too! If sites like geocaching.com could just use RSS, people like us wouldn’t have to write scrapers. I have an email into the main site, but I haven’t heard a response yet. Thanks for the warning though. I may just have to rethink some of my strategy.
01/15/2004 at 2:24 am #1746170ok, so i took server load into account…it waits several minutes between each cacher update, and only uses one web hit per cacher. I’ll let it run for a few days, and I’ll let everyone know.
01/16/2004 at 3:39 pm #1746171Another tactic you could try is bouncing the request through a number of public proxies to avoid the IP ban.
For example: http://www.publicproxyservers.com/page1.html
The script could just cycle throught the list.
01/16/2004 at 3:44 pm #1746172Something like this:
function sendToHost($proxy, $host,$method,$path,$data,$useragent=0)
{
// Supply a default method of GET if the one passed was empty
if (empty($method)) {
$method = ‘GET’;
}
$method = strtoupper($method);
$fp = fsockopen($proxy, 8080);
if ($method == ‘GET’) {
$path .= ‘?’ . $data;
}
fputs($fp, “$method http://$host/$path HTTP/1.0rn”);
fputs($fp, “Host: $hostrn”);
if ($useragent) {
fputs($fp, “User-Agent: MSIErn”);
}
if ($method == ‘POST’) {
fputs($fp,”Content-type: application/x-www-form-urlencodedrn”);
fputs($fp, “Content-length: ” . strlen($data) . “rn”);
}
fputs($fp, “Connection: closernrn”);
if ($method == ‘POST’) {
fputs($fp, $data);
}while (!feof($fp)) {
$buf .= fgets($fp,128);
}
fclose($fp);
return $buf;
}01/16/2004 at 4:27 pm #1746173Acutally, I’m using curl (http://curl.haxx.se/) and lynx, which does some nice web browser emulation. I’m using perl to manage everything, and I’ve set a sleep of 60 seconds between grabbing a user’s stats. Now that I think of it, I’ll add a random number between 1 and 30 to that, so from a access.log read, it’ll look like I’m just browsing geocaching.com randomly. I’m already getting the users’s stats in a random order each time (don’t want consistant logs).
I feel that this is fair also to the geocaching.com webserver. If I had the time, I’d be hitting all of these pages by hand each night, and updating a spreadsheet. This just allows me to do that without being at my computer, yet emulates as closely as possible what normal website usage would be. I’m a paying member of geocaching.com, and I’d even consider paying more if I had easier access to this kind of information
01/19/2004 at 4:30 am #1746174Ok guys, I know its cold outside, and digging through the snow may not seem like fun, But I really think you need to get out more.
01/19/2004 at 6:18 am #1746175If I got out more, I wouldn’t have time to add owned stats!
Here they are:
http://www.igotsomestuff.com/cgi-bin/showstats.plAnyone have a nice big list of WI geocacher usernames? Until I get the script that automatically finds WI cachers and adds them to the database, I’ve been digging through logs and adding people in by hand.
Thanks!
01/19/2004 at 6:24 am #1746176quote:
Originally posted by houseofbrew:
Anyone have a nice big list of WI geocacher usernames? Until I get the script that automatically finds WI cachers and adds them to the database, I’ve been digging through logs and adding people in by hand.Thanks!
Yup: http://wi-geocaching.com/membership/list.php?sort=created
01/19/2004 at 6:30 am #1746177Maybe it would be good to write a script to pull user names off the WGA’s “recent logs” page. This would prevent you from tracking inactive teams and also develop a record of new cachers. The Beast had great records, until the number of new cachers became so great that hand written records became way too much of a task.
A link to e-mail you on the site would be good too.
01/19/2004 at 1:36 pm #1746178Great idea Cathunter.
See, the biggest problem I’m running into is that you can’t look up someone’s stats by username. You have to use their user number. This is found by visiting the user page, and copy-and-pasting a portion of the URL which contains their user number. I’ve got a script now, that can obtain user’s numbers, but you have to feed it the exact username of geocaching.com.
I’m pretty sure in the next few nights I’ll be able to connect to the WGA page, and use those names, look up numbers, and get them going for re-occuring stats.
Thanks for all the advice that everyone’s given. I hope that everyone will be able to enjoy these once they are more completed.
01/20/2004 at 2:11 pm #1746179I can provide a list of about 800 names belonging to Wisconsin cachers that have appeared on gc.com over the past two years. This represents about 2/3 of the Wisconsin cachers active during that time, and includes >80% of those who are currently active.
01/20/2004 at 2:36 pm #1746180great!
jcb at integralpro.comIf you could please send me an email at the above address (notice that I spelled @ out, so that spiders don’t pick up my email and spam me )
-
AuthorPosts
- The forum ‘Old General Forum (Busted)’ is closed to new topics and replies.