What is a good algorithm to interpolate missing time-series data?

If your time-series is very slowly varying or smooth, a low pass filter aka. moving averages works well enough. You will however not be able to recover any "high frequency" information in the original time series. Chances are though, whether you are dealing with financial time series, or geophysical data, or EEGs, your data is rarely smooth; so it is preferable to use methods that extract more information.Kernel Smoothing Methods: Kernel smoothing methods will look for other patterns in the time series and average points together only when their neighborhoods appear similar to each other. The "kernel" performs the relevant matching and weights different samples according to their similarity. To start off, you might want to use the Parzen-Rosenblatt or Nadaraya-Watson estimators.Splines for interpolation: Another alternative to low-pass filtering, is to assume that the underlying continuous data is best modeled as a combination of piecewise polynomials of different orders. Linear interpolation would interpolate your data assuming that the relationship in the interval of two samples is a 1st order polynomial (like a line), while cubic interpolation assumes that a 3rd order polynomial would describe the interval between two samples. In general a higher order polynomial is more accurate and helps you recover a cleaner and smoother version of your series. [2a-b] survey different interpolation methods and compares linear (moving average), quadratic and cubic interpolation methods in medical imaging. But the same would apply for time-series as well. One advantage of the spline basis is that it can handle non-uniform spacing of samples.Signal Estimation using Wavelet Transforms: The wavelet basis is optimal for both approximating point singularities in a time series as well as stochastic estimation of such signals. This is probably the hardest to get a handle on if you are unfamiliar with multiresolution and wavelet analysis, but is definitely the most desirable one if accuracy is important.Implementations for all of these can be found in the R package for Time Series [4][1] Kernel Regression Smoothing of Time Series[2a][2b][3] Wavelet Methods for Time Series[4]

What do you think about the poll that 45% of Republicans said they strongly or somewhat support the storming of the Capitol? If you are one of those, why do you think so, and if you are Republican, but disagree, why do you think so many approve it?

That poll[1] is, not to put it too bluntly, garbage.Market research is the “family business” so I’m reasonably certain about my conclusions, but just think it through logically:The YouGov poll that claims something like 45% of Republicans support the attack on the Capitol was released to the public when exactly? Only just over two hours from the event itself, at 7:20 PM on January 6.And according to their own website:YouGov polled 1,448 registered voters, including 1,397 who were aware of the events at the Capitol. The survey was conducted on January 6, 2021 between 5:17 p.m. and 5:42 p.m. Eastern time.They left the survey “live” (i.e., collecting data) for twenty-five minutes, and it was before the attack had even finished (the last insurrectionists weren’t removed from the Capitol until after 6-ish).Now, they used weighting to norm the data to basic demographics:Data is weighted on age, gender, education level, political affiliation and ethnicity to be nationally representative of adults in the United States.But there’s a fatal flaw in it. One that every credible market researcher and pollster knows, because it’s pretty much legend in the industry—the kind of legend that’s a cautionary tale: Dewey Defeats Truman.For some reason, the Wiki article[2] on that (in)famous headline doesn’t explain it. But the reason that not only the Tribune but many other papers got it so badly wrong was because they relied very heavily on a Reader’s Digest poll. The poll had very broad reach and high sample size, so it was considered quite reliable at the time.But… it was biased towards people who subscribe to Reader’s Digest, who were generally somewhat wealthier, more suburban, and more conservative than the population at large.[EDIT: Well, that’s embarrassing. I mixed up my polling blunders and also my publications. See Max Sklar’s comment, here. The point still stands, and I think I’ll leave the answer as-is for now.]In this case, the YouGov poll captured people who were online and in a mood to answer surveys at ~5:30 PM on January 6.Among Republicans specifically… that biases heavily towards the alt-right, keyboard-warrior, MAGA crowd.The rest of us either weren’t online at all (watching the events on TV instead, or just busy elsewhere—particularly in time zones other than EST, where the workday was still ongoing), or we were online and horrified, and not at all in the mood to click on some snap poll.Obviously, some did, which is why you see only 45% in favor rather than, say, 85%. But that 45% is still a ginned-up number, I’d guess well more than double what it truly is. Personally, I’ve encountered only a very small percentage of Republicans who will even make excuses for the attack, and even fewer who actually support it.“Snap polls” are simply not reliable (at least, for politics). At all. And there’s precious little way in which they can be made reliable.Go with online implementation, and you bias one way—because a certain subset of the population spends more time online and is more likely to click a poll and answer immediately rather than saying, “I’ll check that out after dinner.”Go with old-fashioned phone calls, and you bias a different way—because a certain subset of the population is more likely to be near their phone at a given time and more likely to pick up on an unknown number rather than saying, “If it’s actually important, they’ll leave a voicemail.”And those biases are not ones that can be corrected by weighting alone, because they aren’t simply biases in age, or race, or sex, or any other univariate demographic with already known distribution.Snap polls have their place—for instance, in soliciting the “hot take” of industry experts in a particular industry on a subject of specific relevance to that industry.You could probably construct a methodologically valid snap poll of, say, automotive engineers on the subject of a new Tesla prototype test that just went public, because there’s no real reason to think that the opinions of automotive engineers about automotive engineering will differ based on whether they spend more or less time online in the mid-afternoon, or based on their eagerness or lack thereof to answer polls. (But then again, maybe not—at the time, no one could think of reasons why people’s answers would be different based on whether they had a Reader’s Digest subscription…)But in politics, where nearly every behavioral habit correlates to some political belief or other, sample bias is fatal and not really possible to correct “on the back end” by simply making compensatory tweaks to the data.Imagine if someone had tried to predict the 2020 election based on exit polls at polling locations… when only ~27% of voters, biased more than 2:1 towards Trump voters,[3] voted in-person on Election Day.Original Question:“What do you think about the fact that 45% of Republicans said they strongly or somewhat support the storming of the Capitol? If you are one of those, why do you think so, and if you are Republican, but disagree, why do you think so many approve it?”Footnotes[1] Most voters say the events at the US Capitol are a threat to democracy | YouGov[2] Dewey Defeats Truman - Wikipedia[3] The 2020 voting experience: Coronavirus, mail concerns factored into deciding how to vote

What are the general principles of when to use a probability vs. a non-probability sampling design?

Probabilistic designs:Simple random sampling, i.e. everyone has the same probability of being sampled and the event that one person is sampled does not depend on who else was sampled, is the default design you will normally chose if there isn’t a good reason for doing anything else. It’s simple to model mathematically (the basic formulas for standard error etc are based on this design) and there are minimal risks of messing up the implementation. But the disadvantage is that it may accidentally produce a skewed sample, so sometimes you can do better.Stratified sampling samples a fixed number of subjects from each of a number of strata, to make sure the sample is representative for that stratum. For example, if you need 100 participants, you may deliberately sample 50 men and 50 women. This can improve the accuracy but is more complicated to implement, and also there’s a risk that someone will know in advance that they will not be sampled because their stratum is already “full”. Or the interviewer will know which stratum someone belongs to. This can skew the results.Weighted sampling is used when you want to give subjects with certain characteristics bigger probability of being sampled for example because they represent small groups that are important to have covered. Weighted sampling is usually a type of stratified sampling, but you can also calculate the weights individually based on continous variables.Cluster (multi-level) sampling: You first draw a sample of clusters, and then sample individuals within those clusters. This gives you bigger standard errors than simple randomization but is often cheaper to implement, for example it reduces travel costs for data collectors.Non-probabilistic:Systematic sampling is used when you want to minimize the impact of a continous confounder. For example, in geological surveys you may want to sample points with a fixed distance between them so that you are sure no areas a left out.Convenience sampling: You just sample whoever you can easily get. Very poor design, but it is cheap, so for example for a pilot study or experimental studies where sample representativeness doesn’t matter so much it may be OK.

