Projects
-
What Is A Snowflake Stage?
At work the past week, I’ve been working with large datasets, trying to figure out the most efficient way to process and transform the data without overloading other teams’ servers.
Disclaimer, I’m not a data scientist and SQL is not my strong suit. Any mistakes in the below are mine :)
So imagine you have a table like this:
First Last Favorite_color Zip_code Angela Apple red 90210 Bruce Banana yellow 02134 Claire Canteloupe orange 20008 … … … … Zed Zucchini …zcolor 12345 And let’s say that table has eleventy billion rows in it. The actual number is not important, for the purposes of our exercise the fact that it is “huge” is the important part.
Now let’s say you want to get everyone’s first name whose Zip code matches a list of 10,000 arbitrary zip codes.*
You could do:
SELECT First FROM my_table WHERE Zip_code in (10001, 10002, 10003 ... 20000)
but that is very slow! I don’t know how long it takes on this hypothetical table but in my real-world scenario, a similar query took 20 seconds for 1000 items in the “in” clause. With the amount of data we need to pull from this table (which is significantly more than 10,000 rows), batching in groups of 1000 would be prohibitively slow.
However, if you had your desired ZIP codes as a separate table,
my_codes
:Desired_zip_codes 10001 10002 … 20000 you could run a much faster query:
SELECT First from my_table INNER JOIN my_codes ON my_table.zip_code = my_codes.desired_zip_codes
This is much faster because sql magic?**
In my case, I had confirmed that a join would be faster than an in clause. So how do I get my list of codes into a JOIN-able format?
A colleague suggested using a Snowflake Stage, which I had never heard of before. The basic documentation on stages explains how to create them. Essentially it’s a loading area for ingesting structured data that is not in your database.
In my case I have a CSV uploaded to S3 with the data I want to join on. So I need to load that into a stage and then I can join against it.
So we create a stage:
CREATE STAGE my_stage URL='s3://mybucket/mypath'
There are a million other params you can pass in to tweak the behavior of said stage, see more on that here. But that’s all it takes to create a very basic stage. Depending on how your database and AWS integrations are set up you probably need to add some permissions. This will be different for every setup so I’ll spare you the gory details of what I did here (it was mostly Terraform and pinging our data-platform team for help). But assuming you’ve created the stage and your database role has permissions to read from it, you are now in business!
Say you have a file in your S3 storage:
mybucket/mypath/data.csv
. It’s now as simple as:SELECT $1, $2 FROM @my_stage/data.csv
Snowflake doesn’t know that the header of your CSV is its column names (if you’re not using a headerless CSV, which is a whole other thing) which is why we have to use positional arguments. However, we can fix that by importing the CSV into a temporary (or permanent, depending on our use case) table:
CREATE TEMP TABLE mytemptable (ZIP number);
Lets now say our zip code list is a one-column CSV:
COPY INTO mytemptable @my_stage/data.csv; SELECT First FROM mytable JOIN mytemptable ON mytemptable.ZIP = mytable.Zip_code;
As before, there are about a thousand different configurable options here. I especially like
MATCH_BY_COLUMN_NAME
, which, when paired withparse_header=TRUE
, automagically determines the column names in your CSV (with headers) and inserts them into the table, even if the columns in the table are in a different order.Note:
MATCH_BY_COLUMN_NAME
doesn’t get you a free pass to not create the temp table, if you want to select by column name. To dynamically create the temp table without knowing the headers in the CSV, that’s a separate post. Or see “Generate column description” in my resources below.That said, I’ve now created a script that does everything I want to and selects the data I want from Snowflake, joining on my imported CSV! It’s still quite slow, taking a few minutes to run each time, but compared to the previous process, this is a huge improvement!
Resources:
- Querying data in staged files
- Generate column description
- Create FILE FORMAT (very useful for specifying options on your incoming CSV)
*I realize this is a contrived example; believe me that my real-world use-case is more relevant.
**I learned in the process of writing this blog post that a JOIN is not always faster than an IN, and ‘real’ data scientists and database admins have dedicated many words to analyzing which query is faster under which circumstances. (Don’t forget that with most database languages you can run something like EXPLAIN PLAN before actually running the query to see how long it might take.)
-
More Fitbit Dev Resources
The developing-for-fitbit journey continues…
Breaking changes between SDK 4.x and 5.x
The older Fitbit I have, a Fitbit Versa 2, uses software written with version 4.x (or lower?) of the Fitbit SDK. Modern Fitbits use… I think it’s up to 6.x? And excitingly, there were a ton of breaking changes between version 4.x and 5.x.
I solved a number of the issues here, but am running into more, especially with premade Fitbit components like buttons–which, incidentally, are very poorly documented.
However! Pure luck and a lot of random web searching led me to this official Fitbit demo project: https://github.com/Fitbit/sdk-exercise/. And we’re in luck, the initial version was written with SDK 3.0 and then was abandoned shortly after, even though all of Fitbit’s developer docs were updated for 5.0 when the Versa 3 came out. In other words, it’s an ideal resource for learning how things ‘used to’ be done.
From that I found that if I want to place a button on the screen on the Versa 3, I would import
<link rel="import" href="/mnt/sysassets/widgets/text_button.defs" />
, but if I want to place a button on the screen on the Versa 2, I should import<link rel="import" href="/mnt/sysassets/widgets/square_button_widget.gui" />
.I’m not even a hundred percent sure how to browse that widgets directory, if it’s even possible; I suspect it’s built into Fitbit’s firmware and can’t be accessed by humans. Which makes this dead repo possibly the best source of truth for SDK 4.0 widget names?
There are a number of other demo projects in various states of completion in the Fitbit github organization. I believe the repos beginning with
sdk-
are the most useful, but take a look.General help
Bless this Aussie developer who has 1) made about a thousand watchfaces for Fitbit (including not one, but two watchfaces that actually play Pong) and 2) written up his own SDK guide filled with helpful tips.
Hopefully these resources help you as they are helping me. And I hope I will have an update on my own app(s) soon!
-
Common Errors When Developing for Fitbit
Or maybe just common to me?
Problem:
Install failed: RPC call to 'app.install.stream.begin' could not be completed as the RPC stream is closed
Cause:
Jury’s out on what causes this. It seems to happen when I sleep my laptop; the Fitbit simulator doesn’t seem to be able to recover.
Solution:
Remove
~/Library/Application\ Support/Fitbit\ OS\ Simulator/
, which contains caches, preferences, etc. You’ll have to reset your preferences afterwards of course. -
Developing for the Fitbit Versa
In my previous post I said something about learning Android app development so I could make a Fitbit app. However, silly me– just because Google owns Fitbit now does not mean that Fitbits run Android. (:
Android Wear is what powers Google-owned smartwatches, but if you have a vanilla Fitbit like I do, it’s Fitbit OS. Clear as mud. The upshot, however, is that if you want to write an app for Fitbit OS, you can use plain ol’ Node. So that’s what we’ll do today.
Fitbit’s dev page has a good tutorial which we’ll follow. First, make sure you have node installed, and have downloaded the Fitbit simulator. (Links on Fitbit’s page.) Then, run
npx create-fitbit-app my-app
to get started. For this blog post I’m following Fitbit’s instructions so the prompts to answer are clock, any name, no companion.cd
into your newly created directory.I’m developing this app for my personal use (for now) and my Fitbit is a Versa 2, running the “mira” version of Fitbit OS. The default settings for
npx create-fitbit-app
assume I have at least a Versa 3. To get my new app onto the Versa 2 simulator (and ultimately onto my wrist) I have to edit my package.json:"devDependencies": { - "@fitbit/sdk": "~5.0.0", + "@fitbit/sdk": "~4.2.0", ... "buildTargets": [ - "atlas", - "vulcan", + "mira" ],
According to this forum user we also need to change some filenames in the scaffolded project. We must rename
index.view
toindex.gui
,widget.defs
towidgets.gui
(note the plural change), and edit the contents ofwidgets.gui
to point to/mnt/sysassets/widgets_common.gui
instead of the file it points to.Also, lol, the highest supported version of node is 16, so make sure you’re using node 16 or lower, preferably with something like
nvm
so you can go back to the present day easily. (Poor Fitbit!)Once you’ve done all that, you should be able to start the fitbit CLI with
npx fitbit
, thenbuild
and (assuming the Fitbit simulator is up and running Mira)install
.Huh. Aren’t clocks supposed to, uh, tell time?
Follow the Fitbit getting started tutorial to turn this blank screen into a clock. TLDR: Edit
resources/index.gui
(remember we renamed this from the.view
file used in later SDK versions),styles.css
, andapp/index.js
with the provided code, runbuild
andinstall
(orbi
which apparently is shorthand for both) and you get a very nice clock:Okay, it’s… fine. But hey, it’s up and running. Now let’s do something cooler with it… in a future post.
-
Firebase Crashlytics and Feature Flagging, or What I Learned at WITS Spring 2024
The following info comes from the Women in Tech Summit which I attended in Philadelphia this weekend. I may type up a more fluffy post about how it felt to attend this conference later, but for now, these are my technical notes from the hands-on session I attended.
Android app development seems a lot more complex than web-app development. I might be wrong, I’ve built a lot of web apps but no Android apps. (Yet; I just got a Fitbit Versa and feel like that postage-stamp-sized screen is crying out for me to do something with. Apparently it doesn’t run DOOM yet…)
Anyway, making an app run on the hugely fragmented Android ecosystem, as well as keeping users’ versions up to date (including forcing them to update if there’s a critical security fix) seems like a huge problem. Presumably there are enterprise solutions for this but in my hypothetical DOOM-on-Fitbit scenario where it’s just me, what’s the solution?
Google Firebase has some great options for solving these problems, which I learned about from two Comcast engineers. Crashlytics can help track and centralize crash data from multiple devices, and Remote Config can manage some app features without having to publish a new version of the app.