Sourcefinder is about testing the performance and quality of the Duchamp Sourcefinding application.
We've built a simulated cube of the sky containing various radio sources, and it's the job of Duchamp to work out where the sources are. Our volunteers are currently running Duchamp over the whole cube of simulated data to work how many of the radio sources in the cube it can find. Duchamp will need to be able to identify correct sources while keeping false positives to a minimum.
We welcome any sort of feedback, advice, or bug reports. You can either make a post on the forums or send us an email at email@example.com . We're happy to hear from you.
The future of Sourcefinder
As some of you will remember from my previous post, Sourcefinder is going to see some significant changes coming in the next few months. I thought it was time I properly outlined exactly what's happening.
The introduction of SoFiA
I'm currently working on integrating the SoFiA sourcefinding application in to Sourcefinder. In order to integrate SoFiA cleanly, I'm working on a fairly significant overhaul to a lot of the Sourcefinder backend systems that will allow support for multiple sourcefinding applications. My aim here is to make it as easy as possible to add new sourcefinding applications to the system in the future. If anyone is interested, you can see the changes I'm making in the module_rework branch of our git repository.
I'll most probably be sending out quite a few test work units while working on integrating SoFiA, so you'll probably get odd spats of work until it's integrated properly.
Once SoFiA is working correctly, we'll be processing all of the work units in the simulated cube again, but this time using SoFiA instead of duchamp.
SoFiA vs Duchamp research paper
The scientists who will be using the data from this project plan on writing a research paper comparing the performance of Duchamp and SoFiA as sourcefinders. From what I've been told, the data analysis side of this project is most likely to be performed by an ICRAR studentship student either at the end of this year or the end of next year.
I plan on ensuring that as many people as possible who contributed to Sourcefinder will have their names/usernames listed in the research paper before it's published.
You'll hear more about this paper in a few months once SoFiA is integrated properly in to Sourcefinder.
Real data from ASKAP
As I stated in the previous post, we should have some real data from ASKAP to process on Sourcefinder in the coming months. The moment this data becomes available to me, I'll be sending out work units for Duchamp, and later for SoFiA. I don't have a timeframe on when this data will be available aside from "soon", but I'm hoping we'll see it within a few months.
Visualisation of Sourcefinder results
I plan on developing a little web applet that will probably live on http://www.theskynet.org to allow anyone to view the sources found by Duchamp and SoFiA. My current plan for this applet is to display an image of the cube slice that the source was found in, a small highlight indicating the source, and a list of the users who contributed to finding the source.
I'll be starting work on this applet after SoFiA is integrated in to Sourcefinder.
The workunits that were lost in the back end storage issue that I spoke about in the last post has all been reprocessed (thank you!). This means there wont be a significant number of workunits for Sourcefinder for a little while. I'll try to make this period as short as possible (hopefully a month or two at most), but it really depends on how easy it is to integrate SoFiA.
Project URL change
At some point I plan on changing the project URL from https://sourcefinder.theskynet.org/duchamp to https://sourcefinder.theskynet.org/sourcefinder. The original name 'duchamp' was a carry over from before I inherited this project. I didn't think we'd be running multiple applications, so I just left it. Obviously once SoFiA is working, the 'duchamp' part of the URL wont make much sense, so I'll be changing it to the more generic 'sourcefinder'. I'll give everyone a weeks notice before I change anything, so you should have time to change over easily. I'll also ensure the old URL still works, but simply re-directs to the new one.
There's currently a poll up for adding Sourcefinder to the gridcoin whitelist. If you're interested in voting yes or no, please check out the post Erkan made about it.
I think that's about everything I have for now. I'll try to keep everyone as updated as possible on all of these issues.
Thank you again for helping out with Sourcefinder!
Edit: Additional Information as of 2nd August, 2017
The ASKAP data is still a work in progress, and I've been given an ETA of "before the end of this year". Data measurements on ASKAP have been taken at different rotations of the Earth, and so need to be Doppler corrected to be stacked in to a cube appropriately. This process is still being worked on, but they expect to make significant progress on finalising it in September.
SoFiA work units will have to be around 100mb as opposed to the 10mb of Duchamp work units. I've been told that this is because SoFiA requires a larger cube to develop a source reliability estimate. The 10mb cubes that Duchamp used simply aren't large enough to develop a meaningful reliability measure.
In order to not reduce the number of work units by a factor of 10, I plan on releasing the same cube multiple times with a different parameter set for each work unit.
Originally with Duchamp, each cube was released as one work unit with 176 different parameters to run on that cube.
With SoFiA, each cube will be released in multiple work units, with a smaller number of parameters per cube.
Ultimately, this will result in a set of larger, and slightly longer work units than Duchamp.
19 Jul 2017, 0:14:08 UTC · Discuss
Real data coming soon!
I've just arrived back from a meeting with Kirsten, and I have some extremely good news to share with everyone!
ASKAP has almost completed observations of its first ~500GB cube, and we're going to be processing it when it's done!
Running a sourcefinding application on ASKAP data hasn't been done yet, so this is going to be a world first.
While we're waiting for ASKAP to be done, we're going to be running the simulated super cube again, this time using a new sourcefinding application called SoFiA.
We're going to be looking at whether SoFiA is any better at finding sources than Duchamp is.
Once we have data from both Duchamp and SoFiA, I'll be working on cross matching the sources they found with the original sources catalogue, then we'll have a clear picture of which is the better sourcefinder.
I'm also told that there's a very high chance of a paper being written on the comparison between Duchamp and SoFiA as sourcefinders, so I'll be sure to keep you all up to date if I hear any more about it.
Finally, there's been a bit of a back end storage issue with some of the data we've processed, and I'll need to re-run a few sets of work units, so expect some more work to be coming fairly soon.
Edit: Please see my other post that expands on some of the things I've discussed in this one.
21 Jun 2017, 4:52:54 UTC · Discuss
Congratulations everyone, we made it through the first 681GB of Sourcefinder cubelets!
You've all done amazing work so far, and everyone at theSkyNet thanks you for it.
Now that we've processed a large portion of the simulated cube data, I'm going to be focusing on developing some visualisation tools to allow everyone to actually see the sources they've processed. This tool is going to be incorporated in to theSkyNet.org.
You're basically going to be able to select a cubelet and parameter set to view, and you'll be shown the locations of the sources within that cubelet along with the name of the users who found the sources.
I'm also planning on adding credit stats from Sourcefinder to your dashboard on theSkyNet.org at some point.
To my knowledge, there should also be another ~300GB of data somewhere to process, as the original supercube was around 1TB in size. I believe this data wasn't extracted from the supercube for whatever reason, so I'm going to be working out how to extract it.
Until then, there wont be any more Sourcefinder work units unfortunately.
Anyway, thank you again to all of you for crunching all of this data. I hope you all enjoy the visualisation tool when I've finished building it :)
14 Jun 2017, 0:15:54 UTC · Discuss
Update 5 June 2017
The set of work units I'm pushing out this week are the last of our 681GB batch of data.
To my knowledge, there is still more data to process once this is done, but I'll need to find out how to extract it from the main supercube.
All of the data up until now had been pre-extracted ahead of time for me.
Here are our stats for this week:
Total Cubes: 40269 Total Results: 28105 Total Canonical Results: 12197. 30.2888077678% Average Results Per Cube: 0.697931411259 Good Results: 26197. 93.2111723893% Bad Results: 1908. 6.78882761075% Client Bad: 1906. 6.78171143925% Client InProgress: 2. 0.00711617149973% Client Good: 26197. 93.2111723893% Server Inactive: 0. 0.0% Server Unsent: 0. 0.0% Server InProgress: 581. 2.06724782067% Server Over: 27524. 97.9327521793%
Update 31 May 2017
Not much this week, just a new set of work units.
Here are this week's stats:
Total Cubes: 39928 Total Results: 38293 Total Canonical Results: 11109. 27.8225806452% Average Results Per Cube: 0.959051292326 Good Results: 35874. 93.6829185491% Bad Results: 2419. 6.31708145092% Client Bad: 2405. 6.28052124409% Client InProgress: 14. 0.0365602068263% Client Good: 35874. 93.6829185491% Server Inactive: 0. 0.0% Server Unsent: 9512. 24.8400490951% Server InProgress: 1777. 4.6405348236% Server Over: 27004. 70.5194160813%